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1 . REAL PARTY IN INTEREST 

The real party in interest is Genentech, Inc., South San Francisco, California, by an 
assignment of the patent application U.S. Serial No. 09/918,585 recorded July 30, 2001, at Reel 
012095 and Frame 0677. 

2. RELATED APPEALS AND INTERFERENCES 

There are no related appeals or interferences known to Appellants, Appellants' legal 
representative, or Appellants' assignee that will directly affect or be directly affected by or have a 
bearing on the Board's decision in the present appeal. 

3. STATUS OF CLAIMS 

Claims 58-62 are in this application. 
Claims 1-57 and 63 are canceled. 

Claims 58-62 stand rejected and Appellants appeal the rejection of these claims. 
A copy of the rejected claims involved in the present Appeal is provided in the Claims 
Appendix. 

4. STATUS OF AMENDMENTS 

There were no amendments to the claims submitted after final rejection. All previous 
amendments to the claims have been entered. 

5. SUMMARY OF CLAIMED SUBJECT MATTER 

The invention claimed in the present application concerns an isolated antibody that 
specifically binds to the polypeptide of SEQ ID NO:7 (Claim 58). The invention further provides 
monoclonal antibodies (Claim 59), humanized antibodies (Claim 60), antibody fragments (Claim 
61), and labeled antibodies (Claim 62) that specifically bind to the polypeptide of SEQ ID NO:7. 

Support for the preparation and uses of antibodies is found throughout the specification, 
including, for example, pages 217-225. The preparation of antibodies is described in Example 
104, while Example 106 describes the use of the antibodies for purifying the polypeptides to 
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which they bind. Isolated antibodies are defined in the specification at page 132, lines 29-38. 
Support for monoclonal antibodies is found in the specification at, for example, page 217, line 
30, to page 219, line 11, and Example 104. Support for humanized antibodies is found in the 
specification at, for example, page 219, line 12, to page 220, line 14. Support for antibody 
fragments is found in the specification at, for example, page 131, line 29, to page 132, line 22, 
and page 221, lines 6-34. Support for labeled antibodies is found in the specification at, for 
example, page 133, lines 1-4, and page 224, line 35, to page 225, line 4. 

The polypeptide of SEQ ID NO:7 is designated PR0274, and its amino acid sequence is 
shown in Figure 4, while the encoding nucleic acid sequence (SEQ ID NO:6) is shown in Figure 
3. The specification discloses that various portions of the PR0274 polypeptide possess 
significant sequence similarity to the seven transmembrane receptor proteins (see, for example, 
page 2, line 27 to page 3, line 6). The isolation of cDNA clones encoding PR0274 of SEQ ID 
NO:7 is described in Example 4. Examples 100-103 describe the expression of PRO 
polypeptides in various host cells, including E. coli, mammalian cells, yeast and Baculovirus- 
infected insect cells. Finally, Example 1 14, in the specification at page 331, line 23, to page 346, 
line 4, sets forth a Gene Amplification assay which shows that the PR0274 gene is amplified in 
the genome of certain human lung and colon cancers (see page Table 9). 

The specification discloses that antibodies to PRO polypeptides may be used, for 
example, in purification of PRO (page 225, lines 5-11 and Example 106), in diagnostic assays for 
PRO expression (page 190, lines 3-9, and page 224, line 21 to page 225, line 4), as antagonists to 
PRO (page 198, lines 3-6), and as elements of pharmaceutical compositions for the treatment of 
various disorders (page 223, line 30, to page 224, line 28). 

6. GROUNDS OF REJECTION TO BE REVIEWED ON APPEAL 

I. Whether Claims 58-62 satisfy the utility requirement of 35 U.S.C. §101. 

n. Whether Claims 58-62 satisfy the enablement requirement of 35 U.S.C. §112, first 
paragraph. 

m. Whether Claims 58-62 are patentable under 35 U.S.C. §102(b) over Ho et aL 9 
Science, Vol. 289, pp 265-270. 
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IV. Whether Claims 59-62 are patentable under 35 U.S.C. § 103(a) over Ho et aL, in 
view of Janeway et ah 

7. ARGUMENT 
Summary of the Areuments: 

Issue I: Utility 

Patentable utility of the PR0274 polypeptide and the antibodies which bind it is based 
upon the gene amplification data for the gene encoding the PR0274 polypeptide. The 
specification discloses that the gene encoding PR0274 showed significant amplification, ranging 
from 2.0 to 3.1 fold , in three different lung primary tumors . The Declaration of Dr. Audrey 
Goddard, submitted with Appellants 1 Response filed September 14, 2004, explains that a gene 
identified as being amplified at least 2-fold by the disclosed gene amplification assay in a tumor 
sample relative to a normal sample is useful as a marker for the diagnosis of cancer , for 
monitoring cancer development and/or for measuring the efficacy of cancer therapy. 
Accordingly, the Examiner's assertion that "the specification provides data showing a very small 
increase in DNA copy number, approximately 2-fold, in a few tumor samples for PR0274" 
(Page 4 of the Office Action mailed July 19, 2005), is both factually and scientifically incorrect. 
By referring to the 2.0-fold to 3. 1-fold amplification of the PR0274 gene in lung tumors as "very 
small," the Examiner ignores the teachings of an expert declaration without any basis, or without 
presenting any evidence to the contrary . 

The Examiner has asserted that that "it does not necessarily follow that an increase in 
gene copy number results in increased gene expression and increased protein expression, such 
that antibodies would be useful diagnostically." (Page 7 of the Office Action mailed May 20, 
2004; emphasis added). In support of this assertion, the Examiner cited two references by 
Pennica et al and Gygi et al The Examiner has further cited Hu et al, in support of the 
assertion that "the literature cautions researchers from drawing conclusions based on small 
changes in transcript expression levels between normal and cancerous tissue." (Page 7 of the 
Office Action mailed July 1 9, 2005). 
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The Examiner's reference to the lack of necessary correlation or accurate prediction in some 
of the rejections (as Appellants will discuss in the detailed arguments) clearly shows that the 
Examiner applied an improper legal standard when making this rejection. The evidentiary 
standard to be used throughout ex parte examination in setting forth a rejection is a 
preponderance of the totality of the evidence under consideration. Thus, to overcome the 
presumption of truth that an assertion of utility by the applicant enjoys, the Examiner must 
establish that it is more likely than not that one of ordinary skill in the art would doubt the truth 
of the statement of utility. Only after the Examiner has made a proper prima facie showing of 
lack of utility, does the burden of rebuttal shift to the applicant. 

In contrast, Appellants have submitted ample evidence to show that, in general, if a gene 
is amplified in cancer, it is more likely than not that the encoded protein will be expressed at an 
elevated level. First, the articles by Orntoft et aL, Hyman et al, and Pollack et al (made of 
record in Appellants 1 Response filed September 14, 2004) collectively teach that in general, gene 
amplification increases mRNA expression . Second, the Declaration of Dr. Paul Polakis, 
principal investigator of the Tumor Antigen Project of Genentech, Inc., the assignee of the 
present application, shows that, in general there is a correlation between mRNA levels and 
polypeptide levels . Appellants further note that the sale of gene expression chips to measure 
mRNA levels is a highly successful business, with a company such as Affymetrix recording 
168.3 million dollars in sales of their GeneChip arrays in 2004. Clearly, the research community 
believes that the information obtained from these chips is useful (i.e., that it is more likely than 
not informative of the protein level). 

Taken together, although there are some examples in the scientific art that do not fit 
within the central dogma of molecular biology that there is a correlation between DNA, mRNA, 
and polypeptide levels, these instances are exceptions rather than the rule . In the majority of 
amplified genes , as exemplified by Orntoft et al, Hyman et al, Pollack et al, and the Polakis 
Declaration, the teachings in the art overwhelmingly show that gene amplification influences 
gene expression at the mRNA and protein levels . Therefore, one of skill in the art would 
reasonably expect in this instance, based on the amplification data for the PR0274 gene, that the 
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PR0274 polypeptide is concomitantly overexpressed. Thus, the claimed antibodies that bind the 
PR0274 polypeptide have utility in the diagnosis of cancer. 

Even if there is no correlation between gene amplification and increased mRNA/protein 
expression, (which Appellants expressly do not concede), a polypeptide encoded by a gene that is 
amplified in cancer would still have a specific, substantial, and credible utility. As evidenced by 
the Ashkenazi Declaration and the teachings of Hanna and Mornin, simultaneous testing of gene 
amplification and gene product over-expression enables more accurate tumor classification , even 
if the gene-product, the protein, is not over-expressed. This leads to better determination of a 
suitable therapy for the tumor, as demonstrated by the real-world example of the breast cancer 
marker HER-2/neu. 

Accordingly, Appellants submit that when the proper legal standard is applied, one 
should reach the conclusion that the present application discloses at least one patentable utility 
for the for the PR0274 polypeptide and the claimed antibodies which bind it. 

Issue II: Enablement 

Claims 58-62 stand rejected under 35 U.S.C. §112, first paragraph, allegedly "since the 
claimed invention is not supported by either a specific and substantial asserted utility or a well 
established utility for the reasons set forth above, one skilled in the art clearly would not know how 
to use the claimed invention." (Page 9 of the Office Action mailed July 19, 2005). 

Appellants submit that, as discussed above, the PR0274 polypeptide and the antibodies 
that bind it have utility in the diagnosis of cancer. Based on such a utility, one of skill in the art 
would know exactly how to use the claimed antibodies for diagnosis of cancer, without any 
undue experimentation. 

Issue III: Anticipation by Ho et aL 

Claims 58-62 stand rejected under 35 U.S.C. § 102(b) as being anticipated by Ho et aL, 
Science, Vol. 289, pp 265-270, published July 14, 2000. 

The instant application claims priority to International Application No. 

PCT/US00/03565, which first disclosed the gene amplification results and was filed 

February 11, 2000, over five months before the publication date of Ho et aL The instant 

application has not been granted the earlier priority date on the grounds that "the gene 
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amplification assay fails to disclose a patentable utility for the antibodies to the protein." (Page 
10 of the Office Action mailed July 19, 2005). Appellants respectfully submit that as discussed 
above under Issues I and II, the presently claimed invention is supported by a specific, substantial 
and credible utility and, therefore, the present specification teaches one of ordinary skill in the art 
"how to use" the claimed invention without undue experimentation. Accordingly, the instant 
application is entitled to the effective filing date of February 1 L 2000. and thus Ho et al. is not 
prior art. 

Issue IV: Obviousness over Ho et aL in view of Janewav et al. 

Claims 59-62 stand rejected under 35 U.S.C. § 103(a) as being unpatentable over Ho et al. 
in view of Immunology, The Immune System in Health and Disease, Third Edition, Janeway and 
Travers,Ed., 1997. 

As discussed above, the instant application is entitled to priority to International 
Application No. PCT/USOO/03565, and to the effective filing date of February 11, 2000. Thus 
Ho et al. is not prior art. 

These arguments are all discussed in further detail below under the appropriate headings. 

ISSUE I: Claims 58-62 satisfy the utility requirement of 35 U.S.C. §101 

Claims 58-62 stand rejected under 35 U.S.C. §101 because allegedly "the claimed 
invention is not supported by either a specific and substantial asserted utility or a well established 
utility." (Page 2 of the Office Action mailed July 19, 2005). 

Appellants submit, for the reasons set forth below, that the specification discloses at least 
one credible, substantial and specific asserted utility for the claimed antibodies that bind the 
PR0274 polypeptide. 

A. The Legal Standard for Utility 

According to 35 U.S.C. §101: 

Whoever invents or discovers any new and useful process, machine, manufacture, or 
composition of matter, or any new and useful improvement thereof, may obtain a 
patent therefor, subject to the conditions and requirements of this title. (Emphasis 
added.) 
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In interpreting the utility requirement, in Brenner v. Manson 1 the Supreme Court held that 
the quid pro quo contemplated by the U.S. Constitution between the public interest and the 
interest of the inventors required that a patent applicant disclose a "substantial utility" for his or 
her invention, i.e. a utility "where specific benefit exists in currently available form." 2 The Court 
concluded that "a patent is not a hunting license. It is not a reward for the search, but 
compensation for its successful conclusion. A patent system must be related to the world of 
commerce rather than the realm of philosophy." 3 

Later, in Nelson v. Bowler the C.C.P.A. acknowledged that tests evidencing 
pharmacological activity of a compound may establish practical utility, even though they may not 
establish a specific therapeutic use. The court held that "since it is crucial to provide researchers 
with an incentive to disclose pharmaceutical activities in as many compounds as possible, we 
conclude adequate proof of any such activity constitutes a showing of practical utility." 5 

In Cross v. Iizuka 6 the C.A.F.C. reaffirmed Nelson, and added that in vitro results might 
be sufficient to support practical utility, explaining that "in vitro testing, in general, is relatively 
less complex, less time consuming, and less expensive than in vivo testing. Moreover, in vitro 
results with the particular pharmacological activity are generally predictive of in vivo test results, 
i.e. there is a reasonable correlation there between." 7 The court perceived "No insurmountable 
difficulty" in finding that, under appropriate circumstances, "in vitro testing, may establish a 
practical utility." 8 



1 Brenner v. Manson, 383 U.S. 519, 148 U.S.P.Q. (BNA) 689 (1966). 

2 Id. at 534, 148 U.S.P.Q. (BNA) at 695. 

3 Id. at 536, 148 U.S.P.Q. (BNA) at 696. 

4 Nelson v. Bowler, 626 F.2d 853, 206 U.S.P.Q. (BNA) 881 (C.C.P.A. 1980). 

5 Id. at 856, 206 U.S.P.Q. (BNA) at 883. 

6 Cross v. Iizuka, 753 F.2d 1047, 224 U.S.P.Q. (BNA) 739 (Fed. Cir. 1985). 

7 Id. at 1050, 224 U.S.P.Q. (BNA) at 747. 

8 Id. 
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The case law has also clearly established that applicants' statements of utility are usually 
sufficient, unless such statement of utility is unbelievable on its face. 9 The PTO has the initial 
burden to. prove that applicants' claims of usefulness are not believable on their face. 10 In 
general, an Applicant's assertion of utility creates a presumption of utility that will be sufficient 
to satisfy the utility requirement of 35 U.S.C. §101, "unless there is a reason for one skilled in the 

12 

art to question the objective truth of the statement of utility or its scope." , 

Compliance with 35 U.S.C. §101 is a question of fact. 13 The evidentiary standard to be 
used throughout ex parte examination in setting forth a rejection is a preponderance of the 
totality of the evidence under consideration. 14 Thus, to overcome the presumption of truth that 
an assertion of utility by the applicant enjoys, the Examiner must establish that it is more likely 
than not that one of ordinary skill in the art would doubt the truth of the statement of utility. 
Only after the Examiner made a proper prima facie showing of lack of utility, does the burden of 
rebuttal shift to the applicant. The issue will then be decided on the totality of evidence. 

The well established case law is clearly reflected in the Utility Examination Guidelines 
("Utility Guidelines") 15 , which acknowledge that an invention complies with the utility 
requirement of 35 U.S.C. §101, if it has at least one asserted "specific, substantial, and credible 
utility" or a "well-established utility." Under the Utility Guidelines, a utility is "specific" when it 
is particular to the subject matter claimed. For example, it is generally not enough to state that a 
nucleic acid is useful as a diagnostic without also identifying the conditions that are to be 
diagnosed. 

9 In re Gazave, 379 F.2d 973, 154 U.S.P.Q. (BNA) 92 (C.C.P.A. 1967). 

10 Ibid. 

" In re Longer, 503 F.2d 1380,1391, 183 U.S.P.Q. (BNA) 288, 297 (C.C.P.A. 1974). 

12 See also In re Jolles, 628 F.2d 1322, 206 U.S.P.Q. 885 (C.C.P.A. 1980); In re Irons, 340 F.2d 974, 144 
U.S.P.Q. 351 (1965); In reSichert, 566 F.2d 1154, 1159, 196 U.S.P.Q. 209, 212-13 (C.C.P.A. 1977). 

13 Raytheon v. Roper, 724 F.2d 951, 956, 220 U.S.P.Q. (BNA) 592, 596 (Fed. Cir. 1983) cert, denied, 469 
US 835 (1984). 

14 In re Oetiker, 977 F.2d 1443, 1445, 24 U.S.P.Q.2d (BNA) 1443, 1444 (Fed. Cir. 1992). 

15 66 Fed. Reg. 1092 (2001). 
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In explaining the "substantial utility" standard, M.P.E.P. §2107.01 cautions, however, 
that Office personnel must be careful not to interpret the phrase "immediate benefit to the public" 
or similar formulations used in certain court decisions to mean that products or services based on 
the claimed invention must be "currently available" to the public in order to satisfy the utility 
requirement. "Rather, any reasonable use that an applicant has identified for the invention that 
can be viewed as providing a public benefit should be accepted as sufficient, at least with regard 
to defining a 'substantial' utility." 16 Indeed, the Guidelines for Examination of Applications for 
Compliance With the Utility Requirement, 17 gives the following instruction to patent examiners: 
"If the applicant has asserted that the claimed invention is useful for any particular practical 
purpose ... and the assertion would be considered credible by a person of ordinary skill in the 
art, do not impose a rejection based on lack of utility." 

B. The Data and Documentary Evidence Supporting a Patentable Utility 

Appellants respectfully submit that Appellants rely on the gene amplification data for 
patentable utility of the claimed antibodies that bind the PR0274 polypeptide, and that the gene 
amplification data for the gene encoding the PR0274 polypeptide is clearly disclosed in the 
instant specification under Example 1 14. 

It was well known in the art at the time the invention was made that gene amplification is 
an essential mechanism for oncogene activation. The gene amplification assay is well-described 
in Example 1 14 of the present application. Example 1 14 discloses that the inventors isolated 
genomic DNA from a variety of primary cancers and cancer cell lines that are listed in Table 9, 
including primary lung and colon tumors of the type and stage indicated in Table 8. As a 
negative control, DNA was isolated from the cells of ten normal healthy individuals, which was 
pooled and used as a control. Gene amplification was monitored using real-time quantitative 
TaqMan™ PCR. Table 9 shows the resulting gene amplification data. Further, Example 1 14 
explains that the results of TaqMan™ PCR are reported in ACt units, wherein one unit 



M.P.E.P. §2107.01. 
M.P.E.P. §2107 II (B)(1). 
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corresponds to one PCR cycle or approximately a 2-fold amplification relative to control, two 
units correspond to 4-fold amplification, 3 units to 8-fold amplification etc. 

Appellants respectfully submit that a ACt value of at least 1.0, which is a more than 2- 
fold increase, was observed for PR0274 in primary lung tumors LT4, LT16, and LT18. 
PR0274 showed approximately 1 .00-1 .61 ACt units which corresponds to 2 1 00 -2 1 61 fold, or 2JL 
3,1 fold amplification in three different human primary lung tumors. Accordingly, the present 
specification clearly discloses overwhelming evidence that the gene encoding the PR0274 
polypeptide is significantly amplified in a number of lung tumors. 

It is also well known that gene amplification occurs in most solid tumors, and generally is 
associated with poor prognosis. 

In support, Appellants have submitted, in their Response filed September 14, 2004, a 

Declaration by Dr. Audrey Goddard. Appellants particularly draw the Board's attention to page 3 

of the Goddard Declaration which clearly states that: 

It is further my considered scientific opinion that an at least 2-fold increase in 
gene copy number in a tumor tissue sample relative to a normal {i.e., non-tumor) 
sample is significant and useful in that the detected increase in gene copy 
number in the tumor sample relative to the normal sample serves as a basis for 
using relative gene copy number as quantitated by the TaqMan PCR technique 
as a diagnostic marker for the presence or absence of tumor in a tissue sample of 
unknown pathology. Accordingly, a gene identified as being amplified at least 
2-fold by the quantitative TaqMan PCR assay in a tumor sample relative to a 
normal sample is useful as a marker for the diagnosis of cancer, for 
monitoring cancer development and/or for measuring the efficacy of cancer 
therapy. (Emphasis added). 

As indicated above, the gene encoding the PR0274 polypeptide shows significantly 
higher than a two fold amplification in three different lung tumors. In addition, the Goddard 
Declaration clearly establishes that the TaqMan real-time PCR method described in Example 1 14 
has gained wide recognition for its versatility, sensitivity and accuracy, and is in extensive use for 
the study of gene amplification. The facts disclosed in the Declaration also confirm that based 
upon the gene amplification results, one of ordinary skill would find it credible that PR0274 is a 
diagnostic marker of lung cancer. 
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The Examiner has asserted that "the specification provides data showing a very small 
increase in DNA copy number, approximately 2-fold, in a few tumor samples for PR0274 
(Page 4 of the Office Action mailed July 19, 2005). The Examiner further asserts that "it was 
imperative to find evidence in the relevant scientific literature whether or not a small increase in 
DNA copy number would be considered by the skilled artisan to be predictive of increased 
mRNA and polypeptide levels." (Page 4 of the Office Action mailed July 19, 2005). 

Appellants respectfully submit that the Examiner seems to be applying a heightened 

utility standard in this instance, which is legally incorrect. Appellants have shown that the gene 

encoding PR0274 demonstrated significant amplification, from 2.0-3.1 fold , in three lung 

tumors. As explained in the Declaration of Dr. Audrey Goddard (submitted with the Response 

filed September 14, 2004): 

It is further my considered scientific opinion that an at least 2-fold increase in 
gene copy number in a tumor tissue sample relative to a normal (i.e., non-tumor) 
sample is significant and useful in that the detected increase in gene copy 
number in the tumor sample relative to the normal sample serves as a basis for 
using relative gene copy number as quantitated by the TaqMan PCR technique 
as a diagnostic marker for the presence or absence of tumor in a tissue sample of 
unknown pathology. (Emphasis added). 

By referring to the 2.0-fold to 3.1-fold amplification of the PR0274 gene in lung tumors 

as "very small" the Examiner appears to ignore the teachings within an expert 1 s declaration 

without any basis, or without presenting any evidence to the contrary . Appellants respectfully 

draw the Examiner's attention to the Utility Examination Guidelines (Part HB, 66 Fed. Reg. 1098 

(2001)) which state that: 

"Office personnel must accept an opinion from a qualified expert that is based 
upon relevant facts whose accuracy is not being questioned; it is improper to 
disregard the opinion solely because of a disagreement over the significance or 
meaning of the facts offered". 

Thus, barring evidence to the contrary, Appellants maintain that the 2.2 to 3.1-fold 
amplification disclosed for the PR0274 gene is significant and forms the basis for the utility 
claimed herein. 

The Examiner has further asserted that "[g]iven that PR0274 was amplified in only a 
very small number of tumors of the same type, the data do not support the implicit conclusion of 
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the specification that PR0274 shows a positive correlation with lung cancer, much less that the 
levels of PR0274 would be diagnostic of such." (Page 6 of the Office Action mailed May 20, 
2004). 

Appellants emphasize that they have shown significant DNA amplification in three out of 
the lung tumor samples in Table 9, Example 1 14 of the instant specification. The fact that not all 
lung tumors tested positive in this study does not make the gene amplification data less 
significant. As any skilled artisan in the field of oncology would easily appreciate, not all tumor 
markers are generally associated with every tumor, or even, with most tumors. For example, the 
article by Hanna and Mornin (submitted with the Response filed September 14, 2004), discloses 
that the known breast cancer marker HER-2/neu is "amplified and/or overexpressed in 10%-30% 
of invasive breast cancers and in 40%-60% of intraductal breast carcinoma" (page 1, col. 1). In 
fact, some tumor markers are useful for identifying rare malignancies . That is, the association of 
the tumor marker with a particular type of tumor lesion may be rare, or, the occurrence of that 
particular kind of tumor lesion itself may be rare. In either event, even these rare tumor markers 
which do not give a positive hit for most common tumors, have great value in tumor diagnosis, 
and consequently, in tumor prognosis . The skilled artisan would certainly know that such tumor 
markers are useful for better classification of tumors. Therefore, whether the PR0274 gene is 
amplified in three lung tumors or in all lung tumors is not relevant to its identification as a tumor 
marker, or its patentable utility. Rather, the fact that the amplification data for PR0274 is 
considered significant is what lends support to its usefulness as a tumor marker. 

The Examiner has asserted that "[t]he data presented in the specification were not 
corrected for aneuploidy" and cites a reference by Sen et al. in support of the assertion that "[a] 
slight amplification of a gene does not necessarily mean overexpression in a cancer tissue, but 
can merely be an indication that the cancer tissue is aneuploid." (Page 6 of the Office Action 
mailed May 20, 2005). 

Appellants submit that it is known in the art that detection of gene amplification can be 
used for cancer diagnosis regardless of whether the increase in gene copy number results from 
intrachromosomal changes or from chromosomal aneuploidy. As explained by Dr. Ashkenazi in 
his Declaration (submitted with Appellants' Response filed September 14, 2004), 
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An increase in gene copy number can result not only from intrachromosomal 
changes but also from chromosomal aneuploidy. It is important to understand that 
detection of gene amplification can be used for cancer diagnosis even if the 
determination includes measurement of chromosomal aneuploidy. Indeed, as long 
as a significant difference relative to normal tissue is detected, it is irrelevant if 
the signal originates from an increase in the number of gene copies per 
chromosome and/or an abnormal number of chromosomes. 

Hence, Appellants submit that gene amplification of a gene, whether by aneuploidy or any other 
mechanism, is useful as a diagnostic marker. 

The Examiner has asserted that "[o]ne skilled in the art would do further research to 
determine whether or not the PR0274 polypeptide levels increased significantly in the tumor 
samples. The requirement for such further research makes it clear that the asserted utility is not 
yet in currently available form, i.e., it is not substantial." (Page 4 of the Office Action mailed 
July 19, 2005). 

As discussed above, M.P.E.P. §2107.01 cautions Office personnel not to interpret the 
phrase "immediate benefit to the public" or similar formulations used in certain court decisions 
to mean that products or services based on the claimed invention must be "currently available" to 
the public in order to satisfy the utility requirement. "Rather, any reasonable use that an 
applicant has identified for the invention that can be viewed as providing a public benefit should 
be accepted as sufficient, at least with regard to defining a 'substantial' utility." 18 Indeed, the 
Guidelines for Examination of Applications for Compliance With the Utility Requirement, 19 
gives the following instruction to patent examiners: "If the applicant has asserted that the claimed 
invention is useful for any particular practical purpose ... and the assertion would be considered 
credible by a person of ordinary skill in the art, do not impose a rejection based on lack of 
utility." 

Appellants' position is based on the overwhelming evidence from gene amplification data 
disclosed in the specification which clearly indicate that the gene encoding PR0274 is 
significantly amplified in certain lung tumors. Based on the working hypothesis among those 



18 M.P.E.P. §2107.01. 

19 M.P.E.P. §2107 11(B)(1). 
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skilled in the art that if a gene is amplified in cancer, the encoded protein is likely to be expressed 
at an elevated level , one skilled in the art would simply accept that since the PR0274 gene is 
amplified, the PR0274 polypeptide would be more likely than not over-expressed. Thus data 
relating to PR0274 polypeptide expression may be used for the same diagno stic and prognostic 
purposes as data relating to PR0274 gene expression. Therefore, based on the disclosure in the 
specification, no further research would be necessary to determine how to use the claimed 
antibodies that bind to the PR0274 polypeptide, because the current invention is fully enabled by 
the disclosure of the present application. 

Accordingly, Appellants submit that based on the general knowledge in the art at the time 
the invention was made and the teachings in the specification, the specification provides clear 
guidance as to how to interpret and use the data relating to PR0274 polypeptide expression and 
that the claimed antibodies which bind the PR0274 polypeptide have utility in the diagnosis of 
cancer. 

C. A prima facie case of lack of utility has not been established 

The Examiner has asserted that "it does not necessarily follow that an increase in gene 
copy number results in increased gene expression and increased protein expression, such that 
antibodies would be useful diagnostically." (Page 7 of the Office Action mailed May 20, 2004). 
In support of these assertions, the Examiner referred to Pennica et al and contended that 
"Pennica et al was cited as evidence showing a lack of correlation between gene (DNA) 
amplification and mRNA levels." (Page 4 of the Office Action mailed July 19, 2005). The 
Examiner further referred to Gygi et al, and asserted that "Gygi et al was cited as providing 
evidence that polypeptide levels cannot be accurately predicted from mRNA levels." (Page 4 of 
the Office Action mailed July 1 9, 2005). 

As a preliminary matter, Appellants respectfully submit that it is not a legal requirement 
to establish that gene amplification "necessarily" results in increased expression at the mRNA 
and polypeptide levels, or that protein levels can be "accurately predicted." As discussed above, 
the evidentiary standard to be used throughout ex parte examination of a patent application is a 
preponderance of the totality of the evidence under consideration. Accordingly, Appellants 
submit that in order to overcome the presumption of truth that an assertion of utility by the 

-15- 

On Appeal to the Board of Patent Appeals and Interferences 

Appellants 9 Brief 
Application Serial No. 09/978,192 
Attorney's Docket No. 39780-2630 P1C9 



applicant enjoys, the Examiner must establish that it is more likely than not that one of ordinary 
skill in the art would doubt the truth of the statement of utility. Therefore, it is not legally 
required that there be a "necessary" correlation between the data presented and the claimed 
subject matter. The law requires only that one skilled in the art should accept that such a 
correlation is more likely than not to exist . Appellants respectfully submit that when the proper 
evidentiary standard is applied, a correlation must be acknowledged. 

The Examiner cited the abstract of Pennica et al. for its disclosure that "WISP-1 gene 
amplification and overexpression in human colon tumors showed a correlation between DNA 
amplification and over-expression, whereas overexpression of WISP-3 RNA was seen in the 
absence of DNA amplification. In contrast, WISP-2 DNA was amplified in colon tumors, but its 
mRNA expression was significantly reduced in the majority of tumors compared with expression 
in normal colonic mucosa from the same patient." From this, the Examiner correctly concluded 
that increased copy number does not necessarily result in increased polypeptide expression. The 
standard, however, is not absolute certainty. 

In fact, as noted even in Pennica etal, "[a]n analysis of WISP-l gene amplification and 
expression in human colon tumors showed a correlation between DNA amplification and over- 
expression..:' (Pennica et al, page!4722, left column, first full paragraph, emphasis added). 
Thus the findings of Pennica et al with respect to WISP-1 support . Appellants' arguments. In the 
case of WISP-3, the authors report that there was no change in the DNA copy number, but there 
was a change in mRNA levels. This apparent lack of correlation between DNA and mRNA 
levels is not contrary to Appellants' assertion that a change in DNA copy number generally leads 
to a change in mRNA level. Appellants are not attempting to predict the DNA copy number 
based on changes in mRNA level, and Appellants have not asserted that the only means for 
changing the level of mRNA is to change the DNA copy number. Therefore a change in mRNA 
without a change in DNA copy number is not contrary to Appellants' assertions. 

The fact that the single WISP-2 gene did not show the expected correlation of gene 
amplification with the level of mRNA/protein expression does not establish that it is more likely 
than not, in general, that such correlation does not exist. The Examiner has not shown whether 
the lack or correlation observed for the WISP-2 gene is typical, or is merely a discrepancy, an 
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exception to the rule of correlation . Indeed, the working hypothesis among those skilled in the 
art is that, if a gene is amplified in cancer, the encoded protein is likely to be expressed at an 

elevated level, as was demonstrated for WISP- 1 . 

Accordingly, Appellants respectfully submit that Pennica et al teaches nothing 
conclusive regarding the absence of correlation between amplification of a gene and over- 
expression of the encoded WISP polypeptide. More importantly, the teaching of Pennica et al is 
specific to WISP genes. Pennica et al has no teaching whatsoever about the correlation of gene 
amplification and protein expression in general . 

The Examiner further cited the Gygi et al reference to establish that "the protein levels 
cannot be accurately predicted from the level of the corresponding mRNA transcript." The 
Examiner adds that "Gygi et al ... studied over 150 proteins. . . and found no strong correlation 
between proteins and transcript levels." (Page 7 of the Office Action mailed May 20, 2004). 

Appellants respectfully traverse and point out that, on the contrary, Gygi et al never 
indicate that the correlation between mRNA and protein levels does not exist. Gygi et al only 
state that the correlation may not be sufficient in accurately predicting protein level from the 
level of the corresponding mRNA transcript (Emphasis added) (see page 1270, Abstract). This 
result is expected, since there are many factors that determine translation efficiency for a given 
transcript, or the half-life of the encoded protein. Not surprisingly, Gygi et al concluded that 
protein levels cannot always be accurately predicted from the level of the corresponding mRNA 
transcript in a single cellular stage or type when looking at the level of transcripts across different 
genes . 

Importantly, Gygi et al did not say that for a single gene, a change in th e level of mRNA 
transcript is not positively correlated with a change in the level of protein expression. Appellants 
have asserted that increasing the level of mRNA for a particular gene leads to a corresponding 
increase for the encoded protein. Gygi et al did not study this issue and says absolutely nothing 
about it. One cannot look at the level of mRNA across several different genes to investigate 
whether a change in the level of mRNA for a particular gene leads to a change in the level of 
protein for that gene. Therefore, Gygi et al is not inconsistent with or contradictory to the utility 
of the instant claims, and offers no support for the PTO's rejection of Appellants' asserted utility. 
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Furthermore, Appellants note that contrary to the Examiner's statement, the Gygi data 
indicate a general trend of correlation between protein [expression] and transcript levels 
(Emphasis added). For example, as shown in Figure 5, an mRNA abundance of 250-300 copies 
/cell correlates with a protein abundance of 500-1000 x 10 3 copies/cell. An mRNA abundance of 
100-200 copies/cell correlates with a protein abundance of 250-500 x 10 3 copies/cell (emphasis 
added). Therefore, high levels of mRNA generally correlate with high levels of proteins. In 
fact, most data points in Figure 5 did not deviate or scatter away from the general trend of 
correlation. Thus, the Gygi data meets the "more likely than not standard" and shows that a 
positive correlation exists between mRNA and protein. Therefore, Appellants submit that the 
Examiner's rejection is based on a misrepresentation of the scientific data presented in Gygi et al 

Gygi et al may teach that protein levels cannot be "accurately predicted" from mRNA 
levels in the sense that the exact numerical amounts of protein present in a tissue cannot be 
determined based upon mRNA levels. Appellants respectfully submit that the Office Action's 
emphasis on the need to "accurately predict" protein levels based on mRNA levels misses the 
point. The asserted utility for the claimed polypeptides is in the diagnosis of cancer. What is 
relevant to use as a cancer diagnostic is relative levels of gene or protein expression, not absolute 
values, that is, that the gene or protein is differentially expressed in tumors as compared to 
normal tissues. Appellants need only show that there is a correlation between DNA, mRNA, and 
protein levels, such that gene amplification and mRNA overexpression generally predict protein 
overexpression. A showing that mRNA levels can be used to "accurately predict" the precise 
levels of protein expression is not required . 

In conclusion, the Examiner has not shown that a lack of correlation between gene 
amplification: polypeptide over-expression, as observed for the WISP-2 gene, is typical. In fact, 
contrary to what the Examiner contends, the art indicates that, if a gene is amplified in cancer, it 
is more likely than not that the encoded protein will be expressed at an elevated level. As noted 
even in Pennica et al, a correlation between DNA amplification: polypeptide over-expression 
was observed in the case of WISP-1 and similarly, in Gygi et al, most genes showed a 
correlation between mRNA levels and protein levels. Since the standard is not absolute 
certainty, a prima facie showing of lack of utility has not been made in this instance. 
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The Examiner further cited Hu et al. to the effect that genes displaying a 5-fold change or 
less in mRNA expression in tumors compared to normal showed no evidence of a correlation 
between altered gene expression and a known role in the disease. However, among genes with a 
10-fold or more change in expression level, there was a strong and significant correlation 
between expression level and a published role in the disease. (Page7 of the Office Action mailed 
July 19,2005). 

Appellants first note that the title of Hu et al. is "Analysis of Genomic and Proteomic 
Data Using Advanced Literature Mining." As the title clearly suggests, the conclusion suggested 
by Hu et al. is merely based on a statistical analysis of the information disclosed in the published 
literature. As Hu et al. states, "We have utilized a computational approach to literature mining to 
produce a comprehensive set of gene-disease relationships." In particular, Hu et al. relied on the 
MedGene Database and the Medical Subject Heading (MeSH) files to analyze the gene-disease 
relationship. More specifically, Hu et al "compared the MedGene breast cancer gene list to a 
gene expression data set generated from a micro-array analysis comparing breast cancer and 
normal breast tissue samples." (See page 408, right column). Therefore, Appellants first submit 
that the reference by Hu et al only studies the statistical analysis of micro-array data and not 
gene amplification data. Thus their findings would not be directly applicable to gene 
amplification data. 

According to Hu et al, "different statistical methods" were applied to "estimate the 
strength of gene-disease relationships and evaluated the results." (See page 406, left column, 
emphasis added). Using these different statistical methods, Hu et al "[assessed the relative 
strengths of gene-disease relationships based on the frequency of both co-citation and single 
citation." (See page 41 1, left column). It is well known in the art that various statistical methods 
allow different variables to be manipulated to affect the outcome. For example, the authors 
admit, "Initial attempts to search the literature using" the list of genes, gene names, gene 
symbols, and frequently used synonyms, generated by the authors "revealed several sources of 
false positives and false negatives." (See page 406, right column). The authors further admit 
that the false positives caused by "duplicative and unrelated meanings for the term" were 
"difficult to manage." Therefore, in order to minimize such false positives, Hu et al disclose 
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that these terms "had to be eliminated entirely, thereby reducing the false positive rate but 
unavoidably under-representing some genes. " Id. Hence, Appellants respectfully submit that in 
order to minimize the false positives and negatives in their analysis, Hu et al manipulated 
various aspects of the input data. 

Appellants further submit that the statistical analysis by Hu et al. is not a reliable standard 
because the frequency of citation reflects only the current research interest of a molecule rather 
than the true biological function of the molecule. Indeed, the authors acknowledge that 
"[Relationship established by frequency of co-citation do not necessarily represent a true 
biological link." (See page 411, right column). One would expect that genes with the greatest 
change in expression in a disease would be the first targets of research, and therefore have the 
strongest known relationship to the disease as measured by the number of publications reporting 
a connection with the disease. The correlation reported in Hu only indicates that the greater the 
change in expression level, the more likely it is that there is a published o r known role for the 
gene in the disease, as found by their automated literature-mining software. Thus, Hu's results 
merely reflect a bias in the literature toward studying the most prominent targets, and say nothing 
regarding the ability of a gene that is 2-fold or more differentially expressed in tumors to serve as 
a disease marker. 

Even assuming that Hu et al. provide evidence to support a true relationship, the 
conclusion in Hu et al only applies to a specific type of breast tumor (estrogen receptor (Ex- 
positive breast tumor) and can not be generalized as a principle governing microarray study of 
breast cancer in general, let alone the various other types of cancer genes in general . In fact, even 
Hu et al admit that, "[i]t is likely that this threshold will change depending on the disease as well 
as the experiment. Interestingly, the observed correlation was only found among ER-positive 
(breast) tumors not ER-negative tumors." (See page 412, left column). Therefore, based on 
these findings, the authors add, "This may reflect a bias in the literature to study the more 
prevalent type of tumor in the population. Furthermore, this emphasizes that caution must be 
taken when interpreting experiments that may contain subpopulations that behave very 
differently." Id. (Emphasis added). 
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More importantly, Hu et al did not look for a correlation between changes in mRNA and 
changes in protein levels, and therefore their results are not contrary to Appellants' assertion that 
there is a correlation between the two. Appellants are not relying on any "b iological role" that 
the PRQ274 polypeptide has in cancer for its asserted utility . Instead, Appellants are relying on 
the amplification of the gene encoding PR0274 in certain tumors compared to their normal tissue 
counterparts. Nowhere in Hu does it say that a lack of correlation in their study means that genes 
with a less than five-fold change in level of expression in cancer cannot serve as a diagnostic 
marke r of cancer. 

In summary, Appellants respectfully submit that the Examiner has not shown that a 
change in gene expression level in tumor as compared to normal tissue is not correlated with a 
change in protein expression. The Patent Office has failed to meet its initial burden of proof that 
Appellants 1 claims of utility are not substantial or credible. The arguments presented by the 
Examiner in combination with the Pennica et al, Gygi et al and Hu et al articles do not provide 
sufficient reasons to doubt the statements by Appellants that PR0274 has utility. As discussed 
above, the law does not require that gene amplification "necessarily" results in increased . 
expression at the mRNA and polypeptide levels. Therefore, Appellants submit that the 
Examiner's reasoning is based on a misrepresentation of the scientific data presented in the above 
cited references and application of an improper, heightened legal standard. In fact, contrary to 
what the Examiner contends, the art indicates that if a gene is amplified in cancer, it is more 
likely than not that the encoded protein will be expressed at an elevated level. 

D. It is "more likely than not" for amplified genes to have increased mRNA and 
protein levels 

Appellants have submitted ample evidence to show that, in general, if a gene is amplified 
in cancer, it is more likely than not that the encoded protein will be expressed at an elevated 
level. First, the articles by Orntoft et al, Hyman et al, and Pollack et al, (made of record in 
Appellants' Response filed September 14, 2004) collectively teach that in general gene 
amplification increases mRNA expression . Second, the Declaration of Dr. Paul Polakis, 
principal investigator of the Tumor Antigen Project of Genentech, Inc., the assignee of the 
present application, shows that, in general there is a correlation between mRNA levels and 
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polypeptide levels . Thus, taken together, all of the submitted evidence supports Appellants 1 
position that gene amplification is more likely than not predictive of increased mRNA and 
polypeptide levels. 

Appellants submit that there are numerous articles which show that generally, if a gene is 
amplified in cancer, it is more likely than not that the mRNA transcript will be expressed at an 
elevated level. For example, Orntoft et al (Mol and Cell. Proteomics, 2002, vol. 1, pages 37-45 
- made of record in Appellants' Response filed September 14, 2004) studied transcript levels of 
5600 genes in malignant bladder cancers, many of which were linked to the gain or loss of 
chromosomal material using an array-based method. Orntoft et al showed that there was a gene 
dosage effect and taught that "in general (18 of 23 cases) chromosomal areas with more than 2- 
fold gain of DNA showed a corresponding increase in mRNA transcripts" (see column 1, 
abstract). In addition, Hyman et al {Cancer Res., 2002, vol. 62, pages 6240-45 - made of record 
in Appellants' Response filed September 14, 2004) showed, using CGH analysis and cDNA 
microarrays which compared DNA copy numbers and mRNA expression of over 12,000 genes in 
breast cancer tumors and cell lines, that there was "evidence of a prominent global influence of 
copy number changes on gene expression levels." (See page 6244, column 1, last paragraph). 
Additional supportive teachings were also provided by Pollack et al, (PNAS, 2002, vol. 99, 
pages 12963-12968 - made of record in Appellants' Response filed September 14, 2004) who 
studied a series of primary human breast tumors and showed that ". . .62% of highly amplified 
genes show moderately or highly elevated expression, and DNA copy number influences gene 
expression across a wide range of DNA copy number alterations (deletion, low-, mid- and high- 
level amplification), and that on average, a 2-fold change in DNA copy number is associated 
with a corresponding 1.5-fold change in mRNA levels." Thus, these articles collectively teach 
that in general, gene amplification increases mRNA expression . 

In addition, in their Response filed September 14, 2004, Appellants submitted a 
Declaration by Dr. Polakis, principal investigator of the Tumor Antigen Project of Genentech, 
Inc., the assignee of the present application, to show that mRNA expression correlates well with 
protein levels, in general. As Dr. Polakis explains, the primary focus of the microarray project 
was to identify tumor cell markers useful as targets for both the diagnosis and treatment of cancer 
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in humans. The scientists working on the project extensively rely on results of microarray 
experiments in their effort to identify such markers. As Dr. Polakis explains, using microarray 
analysis, Genentech scientists have identified approximately 200 gene transcripts (mRNAs) that 
are present in human tumor cells at significantly higher levels than in corresponding normal 
human cells. To the date of the Declaration, they have generated antibodies that bind to about 30 
of the tumor antigen proteins expressed from these differentially expressed gene transcripts and 
have used these antibodies to quantitatively determine the level of production of these tumor 
antigen proteins in both human cancer cells and corresponding normal cells. Having compared 
the levels of mRNA and protein in both the tumor and normal cells analyzed, they found a very 
good correlation between mRNA and corresponding protein levels. Specifically, in 
approximately 80% of their observations they have found that increases in the level of a 
particular mRNA correlates with changes in the level of protein expressed from that mRNA. 
While the proper legal standard is to show that the existence of correlation bet ween mRNA and 
polypeptide levels is more likely than not, the showing of approximately 80% cor relation for the 
molecules tested according to the Polakis Declaration greatly exceeds th is legal standard. Based 
on these experimental data and his vast scientific experience of more than 20 years, Dr. Polakis 
states that, for human genes, increased mRNA levels typically correlate with an increase in 
abundance of the encoded protein. He further confirms that "it remains a central dogma in 
molecular biology that increased mRNA levels are predictive of corresponding increased levels 
of the encoded protein." 

Appellants further note that the sale of gene expression chips to measure mRNA levels is 
a highly successful business, with a company such as Affymetrix recording 168.3 millio n dollars 
in sales of their GeneChip arrays in 2004. Clearly, the research community believes that the 
information obtained from these chips is useful (i.e., that it is more likely than not informative of 
the protein level). 

Taken together, although there are some examples in the scientific art that do not fit 
within the central dogma of molecular biology that there is a correlation between polypeptide and 
mRNA levels, these instances are exceptions rather than the rule. In the major ity of amplified 
genes , the teachings in the art, as exemplified by Orntoft et aL, Hyman et aU Pollack et aL, and 
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the Polakis Declaration, overwhelmingly show that gene amplification influences gene 
expression at the mRNA and protein levels. Thus, one of skill in the art would reasonably expect 
in this instance, based on the amplification data for the PR0274 gene, that the PR0274 
polypeptide is concomitantly overexpressed. Accordingly, Appellants submit that the PR0274 
polypeptides and antibodies have utility in the diagnosis of cancer and based on such a utility, 
one of skill in the art would know exactly how to use the claimed antibodies for diagnosis of 
cancer. 

In the Office Action mailed July 19, 2005, the Examiner asserted that "Orntoft et al do 
not appear to look at gene amplification, mRNA levels and polypeptide levels from a single gene 
at a time. . Orntoft et al concentrated on regions of chromosomes with strong gains of 
chromosomal material containing clusters of genes (p.40). This analysis was not done for 
PR0274 in the instant specification. That is, it is not clear whether or not PR0274 is in a gene 
cluster in a region of a chromosome that is highly amplified. Therefore, the relevance, if any of 
Orntoft et al is not clear." (Pages 5-6 of the Office Action mailed July 19, 2005). The Examiner 
further asserted that "Hyman et al also used CGH approach in their research. Less than half 
(44%) of highly amplified genes showed mRNA overexpression (abstract). , .. Therefore, Hyman 
et al also do not support utility of the polypeptides of the instant invention." (Page 6 of the 
instant Office Action). The Examiner further asserted that "Pollack et al, also used CGH 
technology, concentrating on large chromosome regions showing high amplification (p. 12965). 
Pollack et al did not investigate polypeptide levels. Therefore, Pollack et al also do not support 
the asserted utility of the claimed invention." (Page 6 of the Office Action mailed July 19, 2005). 

In Orntoft et al, 1,800 genes that yielded an increase or decrease in mRNA expression in 
two invasive tumors compared to the two non-invasive papillomas were then mapped to 
chromosomal locations. The chromosomes had already been analyzed for amplification by 
hybridizing tumor DNA to normal metaphase chromosomes (CGH). Orntoft et al used CGH 
alterations as the independent variable and estimated the frequency of expression alterations of 
the 1,800 genes in the chromosomal areas. Orntoft et al found that in general (77% and 80% 
concordance) areas with a strong gain of chromosomal material contained a cluster of genes 
having increased mRNA expression (see page 40). Orntoft et al state, "For both tumors 
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TCC733 (p<0.015) and TCC827 (p<0.00003) a highly significant correlation was observed 
between the level of CGH ratio change (reflecting the DNA copy number) and alterations 
detected by the array based technology" (see page 41, column 1). Orntoft et al also studied the 
relationship between altered mRNA and protein levels using 2D-PAGE analysis. Orntoft et al 
state, "In general there was a highly significant correlation (p<0.005) between mRNA and protein 
alterations.. . . 26 well focused proteins whose genes had a known chromosomal location were 
detected in TCCs 733 and 335, and of these 19 correlated (p<0.005) with the mRNA changes 
detected using the arrays." (See page 42, column 2 to page 34, column 2). Accordingly, Orntoft 
et al clearly support Applicants' position that proteins expressed by genes that are amplified in 
tumors are useful as cancer markers. 

The Examiner indicates that Applicants have not indicated whether PR0274 is in a gene 
cluster region of a chromosome. (See page 5 of the Office Action mailed July 19, 2005). 
Applicants fail to see how this is relevant to the analysis. Orntoft et al did not limit their 
findings to only those regions of amplified gene clusters. Further, as discussed below, Hyman et 
al and Pollack et al did gene-by-gene analysis across all chromosomes. 

Applicants respectfully submit that the Examiner has mischaracterized the methods used 
by Hyman et al and Pollack et al in their analysis. These papers did not use traditional CGH 
analysis to identify amplified genes. In Hyman et al, 13,824 cDNA clones were placed on glass 
slides in a microarray and genomic DNA from breast cancer cell lines and normal human WBCs 
were hybridized to the cDNA sequences. For expression analysis, RNA from tumor cell lines 
was hybridized on the same microarrays. The 13,824 arrayed cDNA clones were analyzed for 
gene expression and gene copy number in 14 breast cancer cell lines. 

The Examiner has asserted that Hyman et al found that "[l]ess than half (44%) of highly 
amplified genes showed mRNA expression." (Page 6 of the Office Action mailed July 19, 
2005). In the more detailed discussion of their results, Hyman et al teach that "[u]p to 44% of 
the highly amplified transcripts (CGH ratio, >2.5) were overexpressed (Le. 9 belonged to the 
global upper 7% of expression ratios) compared with only 6% for genes with normal copy 
number." (See page 6242, col. 1 ; emphasis added). These details make it clear that Hyman et al 
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set a highly restrictive standard for considering a gene to be overexpressed; yet almost half of all 
highly amplified transcripts met even this highly restrictiv e standard. 

Further, Hyman et al state that "[t]he cDNA/CGH microarray technique enables the 
direct correlation of copy number and expression data on a gene-by-gene basis throughout the 
genome." (See page 6242, column 2). Therefore, the analysis performed by Hyman et al. was on 
a gene-by gene basis, and clearly shows that "it is more likely than not" that a gene which is 
amplified in tumor cells will have increased gene expression. 

In Pollack et al, DNA copy number alteration across 6,691 mapped human genes in 44 
predominantly advanced primary breast tumors and 10 breast cancer cell lines was profiled. 
Pollack et al. further state, "Parallel microarray measurements of mRNA levels reveal the 
remarkable degree to which variation in gene copy number contributes to variation in gene 
expression in tumor cells." (See Abstract). "Genome-wide, of 1 17 high-level DNA 
amplifications (fluorescence ratios >4, and representing 91 different genes), 62% (representing 
54 different genes; . . .) are found associated with at least moderately elevated mRNA levels 
(mean-centered fluorescence ratios >2), and 42% (representing 36 different genes) are found 
associated with comparably highly elevated mRNA levels (mean-centered fluorescence ratios 
>4)." (See page 12966, column 1). Therefore, the analysis performed by Pollack et al. was also 
on a gene-by gene basis, and clearly shows that "it is more likely than not" that a gene which is 
amplified in tumor cells will have increased gene expression. 

The Examiner further asserts that "none of the three papers reported that the research was 
relevant to identifying probes that can be used as cancer diagnostics." (Page 6 of the Office 
Action mailed July 19, 2005). Applicants respectfully point out that Hyman et al. conducted 
additional studies of one of the genes found to be amplified, HOXB7, and found "a clinical 
association between HOXB7 amplification and poor patient prognosis." (Page 6244, col.l to 
col.2; emphasis added). Thus the results of Hyman et al. confirm that genes which are amplified 
in tumors have prognostic utility . The Examiner's attention is also respectfully directed to the 
final paragraph of Pollack et al, wherein the authors conclude that "a substantial portion of the 
phenotypic uniqueness (and, by extension, the heterogeneity in clinical behavior) among patients' 
tumors may be traceable to underlying variation in DNA copy number." (Page 12698, col. 2). 
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Accordingly, Pollack et aL confirm that genes that are amplified in at least one type of tumor are 
useful as markers for that type of tumor, and for prognostic uses directed to that type of tumor. 

With respect to the correlation between mRNA expression and protein levels, the 
Examiner asserts that the Polakis Declaration is insufficient to overcome the rejection of claims 
58-62 since it is limited to a discussion of data regarding the correlation of mRNA levels and 
polypeptide levels and not gene amplification levels. The Examiner asserts that the Declaration 
does not provide data such that the Examiner can independently draw conclusions. (Page 7 of 
the instant Office Action). 

Appellants submit that Dr. Polakis' Declaration was presented to supp ort the position that 
there is a correlation between mRNA levels and polypeptide levels, the correlatio n between gene 
amplification and mRNA levels having already been established by the data show n in the Orntoft 
et aL Hyman et aL. and Pollack et al articles . Appellants emphasize that the opinions expressed 
in the Polakis Declaration, including the quoted statement, are all based on factual findings. 
Subsequently, antibodies binding to about 30 of these tumor antigens were prepared, and mRNA 
and protein levels were compared. In approximately 80% of the cases, the researchers found that 
increases in the level of a particular mRNA correlated with changes in the level of protein 
expressed from that mRNA when human tumor cells are compared with their corresponding 
normal cells. Dr. Polakis' statement that "an increased level of mRNA in a tumor cell relative to 
a normal cell typically correlates to a similar increase in abundance of the encoded protein in the 
tumor cell relative to the normal cell" is based on factual experimental findings , clearly set forth 
in the Declaration. Accordingly, the Declaration is not merely conclusive, and the fact-based 
conclusions of Dr. Polakis would be considered reasonable and accurate by one skilled in the art. 

The case law has clearly established that in considering affidavit evidence, the Examiner 
must consider all of the evidence of record anew. 20 "After evidence or argument is submitted by 
the applicant in response, patentability is determined on the totality of the record, by a 



20 In re Rinehart, 531 F.2d 1084, 189 U.S.P.Q. 143 (C.C.P.A. 1976); In re Piasecki, 745 F.2d. 1015, 226 
U.S.P.Q. 881 (Fed. Cir. 1985). 
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preponderance of the evidence with due consideration to persuasiveness of argument" 
Furthermore, the Federal Court of Appeals held in In re Alton, "[W]e are aware of no reason why 
opinion evidence relating to a fact issue should not be considered by an examiner." Applicants 
also respectfully draw the Examiner's attention to the Utility Examination Guidelines 23 which 
state, "Office personnel must accept an opinion from a qualified expert that is based upon 
relevant facts whose accuracy is not being questioned; it is improper to disregard the opinion 
solely because of a disagreement over the significance or meaning of the facts offered." 
The statement in question from an expert in the field (the Polakis Declaration) states that "it is 
my considered scientific opinion that for human genes, an increased level of mRNA in a tumor 
cell relative to a normal cell typically correlates to a similar increase in abundance of the encoded 
protein in the tumor cell relative to the normal cell." Therefore, barring evidence to the contrary 
regarding the above statement in the Polakis Declaration, this rejection is improper under both 
the case law and the Utility guidelines. 

Taken together, although there are some examples in the scientific art that do not fit 
within the central dogma of molecular biology that there is a correlation between polypeptide and 
mRNA levels, these instances are exceptions rather than the rule. In the majority of amplified 
genes , the teachings in the art, as exemplified by Orntoft et al 9 Hyman et al y Pollack et al t and 
the Polakis Declaration, overwhelmingly show that gene amplification influences gene 
expression at the mRNA and protein levels. Therefore, one of skill in the art would reasonably 
expect in this instance, based on the amplification data for the PR0274 gene, that the PR0274 
polypeptide is concomitantly overexpressed. Thus, Appellants submit that the PR0274 
polypeptide and the claimed antibodies that bind it have utility in the diagnosis of cancer. 



21 In re Alton, 37 U.S.P.Q.2d 1578, 1584 (Fed. Cir. 1996)(quoting In re Oetiker, 977 F.2d 1443, 1445, 24 
U.S.P.Q.2d 1443, 1444 (Fed. Cir. 1992)). 

22 Mat 1583. 

23 Part IIB, 66 Fed. Reg. 1098 (2001). 
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E. Even if a prima facie case of lack of utility has been established, it should be 
withdrawn on consideration of the totality of evidence 

Even if one assumes arguendo that it is more likely than not that there is no correlation 
between gene amplification and increased mRNA/protein expression, which Appellants submit is 
not true, a polypeptide encoded by a gene that is amplified in cancer would still have a specific, 
substantial, and credible utility. In support, Appellants respectfully draw the Board's attention to 
page 2 of the Declaration of Dr. Avi Ashkenazi (submitted with the Response filed September 
14, 2004) which explains that, 

even when amplification of a cancer marker gene does not result in significant 
over-expression of the corresponding gene product, this very absence of gene 
product over-expression still provides significant information for cancer diagnosis 
and treatment. Thus, if over-expression of the gene product does not parallel gene 
amplification in certain tumor types but does so in others, then parallel monitoring 
of gene amplification and gene product over-expression enables more accurate 
tumor classification and hence better determination of suitable therapy. In 
addition, absence of over-expression is crucial information for the practicing 
clinician. If a gene is amplified but the corresponding gene product is not over- 
expressed, the clinician accordingly will decide not to treat a patient with agents 
that target that gene product. 

Appellants thus submit that simultaneous testing of gene amplification and gene product 
over-expression enables more accurate tumor classification, even if the gene-product, the protein, 
is not over-expressed. This leads to better determination of a suitable therapy. Further, as 
explained in Dr. Ashkenazi's Declaration, absence of over-expression of the protein itself is 
crucial information for the practicing clinician. If a gene is amplified in a tumor, but the 
corresponding gene product is not over-expressed, the clinician will decide not to treat a patient 
with agents that target that gene product. This not only saves money, but also has the benefit that 
the patient can avoid exposure to the side effects associated with such agents. 

This utility is further supported by the teachings of the article by Hanna and Mornin. 
(Pathology Associates Medical Laboratories, August (1999); submitted with the Response filed 
September 14, 2004). The article teaches that the HER-2/neu gene has been shown to be 
amplified and/or over-expressed in 10%-30% of invasive breast cancers and in 40%-60% of 
intraductal breast carcinomas. Further, the article teaches that diagnosis of breast cancer includes 
testing both the amplification of the HER-2/neu gene (by FISH) as well as the over-expression of 
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the HER-2/neu gene product (by IHC). Even when the protein is not over-expressed, the assay 
relying on both tests leads to a more accurate classification of the cancer and a more effective 
treatment of it. 

The Examiner asserts that "[t]he Hanna reference is not applicable to the instant fact 
situation, as it deals with a known tumor associated gene, and not with a prospective analysis of 
the type found in this specification." (Page 9 of the Office Action mailed July 19, 2005). To the 
contrary, Appellants have clearly shown that the gene encoding the PR0274 polypeptide is 
amplified in at least three primary lung tumors. Therefore, the PR0274 gene, similar to the 
HER-2/neu gene disclosed in Hanna et ai, is a tumor associated gene. Furthermore, as discussed 
above, in the majority of amplified genes, the teachings in the art overwhelmingly show that gene 
amplification influences gene expression at the mRNA and protein levels. Therefore, one of skill 
in the art would reasonably expect in this instance, based on the amplification data for the 
PR0274 gene, that the PR0274 polypeptide is concomitantly overexpressed. 

However, even if gene amplification does not result in overexpression of the gene product 
(i.e., the protein) an analysis of the expression of the protein is useful in determining the course 
of treatment, as supported by the Ashkenazi Declaration. The Examiner has asserted that "the 
gene product of the instant invention has not been demonstrated to be involved in cancer. Over- 
expression of a gene product in a cancer cell does not necessarily mean that the gene product is 
involved in the cancer and that targeting the gene product would be therapeutic." (Page 9 of the 
Office Action mailed July 19, 2005). The Examiner appears to view the testing described in the 
Ashkenazi Declaration and the Hanna paper as experiments involving further characterization of 
the PR0274 polypeptide itself. In fact, such testing is for the purpose of characterizing not the 
PR0274 polypeptide, but the tumors in which the gene encoding PR0274 is amplified. Testing 
of tumor markers such as PR0274 is useful for tumor categorization even if the tested marker is 
not itself the intended therapeutic target . The PR0274 polypeptide is therefore useful in tumor 
categorization, the results of which become an important tool in the hands of a physician 
enabling the selection of a treatment modality that holds the most promise for the successful 
treatment of a patient. 

-30- 

On Appeal to the Board of Patent Appeals and Interferences 

Appellants 9 Brief 
Application Serial No. 09/978,192 
Attorney's Docket No. 39780-2630 P1C9 



For the reasons given above, Appellants respectfully submit that the present specification 
clearly describes, details and provides a patentable utility for the claimed invention. 
Accordingly, Appellants respectfully request reconsideration and reversal of the rejections of 
Claims 58-62 under 35 U.S.C. §101. 

ISSUE II: Claims 58-62 satisfy the enablement requirement of 35 U.S.C. §112, first 
paragraph. 

Claims 58-62 stand rejected under 35 U.S.C. §112, first paragraph, allegedly "since the 
claimed invention is not supported by either a specific and substantial asserted utility or a well 
established utility for the reasons set forth above, one skilled in the art clearly would not know how 
to use the claimed invention." (Page 9 of the Office Action mailed July 19, 2005). 

In this regard, Appellants refer to the arguments and information presented above in 
response to the outstanding rejection under 35 U.S.C. § 101, wherein those arguments are 
incorporated by reference herein. Appellants respectfully submit that as described above, the 
PR0274 polypeptide has utility in the diagnosis of cancer and based on such a utility, one of skill 
in the art would know exactly how to use the claimed antibodies that bind the PR0274 
polypeptide for diagnosis of cancer, without undue experimentation. 

Accordingly, Appellants respectfully request reconsideration and reversal of the 
enablement rejection of Claims 58-62 under 35 U.S.C. § 1 12, first paragraph. 

ISSUE III: Claims 58-62 are patentable under 35 U.S.C. S102(b) over Ho et al. 

Claims 58-62 stand rejected under 35 U.S.C. §102(b) as being anticipated by Ho et al, 
Science, Vol. 289, pp 265-270, published July 14, 2000. 

Appellants submit that, as discussed above in response to the outstanding rejections under 
35 U.S.C. §101 and 35 U.S.C. §112, first paragraph, for alleged lack of utility and enablement 
Appellants rely on the gene amplification results (Example 1 14) to establish a credible, 
substantial and specific asserted utility for the PR0274 polypeptide and the claimed antibodies 
that bind it. These results were first disclosed in International Application No. 
PCT/USOO/03565, filed February 11, 2000. As discussed above, the disclosure of the instant 
application, which is similar to that of the earlier-filed application (PCT/US00/03565), provides 
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the support required under 35 U.S.C. §112 for the subject matter of the instant claims. 
Accordingly, Applicants submit that the subject matter of the instant claims is disclosed in the 
manner provided by 35 U.S.C. §1 12 in PCT/US00/03565. Therefore, the effective filing date of 
this application is February 1 1, 2000, the filing date of PCT/US00/03565. 

The scientific journal article by Ho et al was published on July 14, 2000, which is over 
five months after the effective filing date of the instant application; hence Ho et al is not prior 
art. 

The Examiner has asserted that the present claims are not entitled to the February 11, 
2000, filing date of PCT/US00/03565 because "the gene amplification assay fails to disclose a 
patentable utility for the antibodies to the protein." (Page 10 of the Office Action mailed July 19, 
2005). 

In this regard, Appellants refer to the arguments and information presented above in 
response to the outstanding rejections under 35 U.S.C. §101 and 35 U.S.C. §112, first paragraph, 
for alleged lack of utility and enablement. These arguments are incorporated by reference herein. 
Appellants respectfully submit that as described above under Issue I, the presently claimed 
invention is supported by a specific, substantial and credible utility and, therefore, the present 
specification teaches one of ordinary skill in the art "how to use" the claimed invention without 
undue experimentation, as described above. 

Accordingly, Appellants respectfully request reconsideration and reversal of the rejection 
of Claims 58-62 under 35 U.S.C. § 1 02(b) as being anticipated by Ho et al 

ISSUE IV: Claims 59-62 are patentable under 35 U.S.C. SI 03(a) over Ho et al. in view of 
Janewav et al. 

Claims 59-62 stand rejected under 35 U.S.C. § 103(a) as being unpatentable over Ho et al 
in view of Immunology, The Immune System in Health and Disease, Third Edition, Janeway and 
Travers, Ed., 1997. 

As discussed above, the effective filing date of this application is February 11, 2000, the 
filing date of PCT/USOO/03565. The scientific journal article by Ho et al was published on July 
14, 2000, which is over five months after the effective filing date of the instant application; hence 
Ho et al is not prior art, and is not available as a reference under 35 U.S.C. §103. 
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Accordingly, Appellants respectfully request reconsideration and reversal of the rejection 
of Claims 59-62 under 35 U.S.C. § 103(a). 



CONCLUSION 



For the reasons given above, Appellants submit that the specification discloses at least 
one patentable utility for the antibodies of Claims 58-62, and that one of ordinary skill in the art 
would understand how to used the claimed antibodies, for example in the diagnosis of lung 
tumors. Therefore, Claims 58-62 meet the requirements of 35 U.S.C. §101 and 35 U.S.C. §112, 
first paragraph. 

Further, this patentable utility for the claimed antibodies was first disclosed in 
International Application No. PCT/US00/03565, filed February 11, 2000, priority to which is 
claimed in the instant application. Accordingly, the instant application has an effective priority 
date of February 11, 2000, and therefore Ho et al, Science, Vol. 289, pp 265-270, published on 
July 14, 2000, is not prior art and does not anticipate the claims under 35 U.S.C. § 102(b) or 
render the claims obvious under 35 U.S.C. § 103(a) in view of Janeway et al. 

Accordingly, reversal of all the rejections of Claims 58-62 is respectfully requested. 

Please charge any additional fees, including fees for additional extension of time, or 
credit overpayment to Deposit Account No. 08-1641 (referencing Attorney's Docket 
No. 39780-2630 P1C9) . 
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Date: February 17, 2006 




Barrie D. Greene (Reg. No. 46,740) 
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CLAIMS APPENDIX 

Claims on Appeal 

58. An isolated antibody that specifically binds to the polypeptide of SEQ ID NO:7. 

59. The antibody of Claim 58 which is a monoclonal antibody. 

60. The antibody of Claim 58 which is a humanized antibody. 

61. An antigen binding fragment of the antibody of Claim 5 8 . 

62. The antibody of Claim 58 which is labeled. 
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9. EVIDENCE APPENDIX 

1. Declaration of Audrey D. Goddard, Ph.D. under 37 C.F.R. §1.132, with attached Exhibits 
A-G: 

A. Curriculum Vitae of Audrey D. Goddard, Ph.D. 

B. Higuchi, R. et al., "Simultaneous amplification and detection of specific DNA 
sequences," Biotechnology 10:413-417 (1992). 

C. Livak, K.J., et al., "Oligonucleotides with fluorescent dyes at opposite ends provide a 
quenched probe system useful for detecting PCR product and nucleic acid hybridization," PCR 
Methods Appl. 4:357-362 (1995). 

D. Heid, C.A. et al., "Real time quantitative PCR," Genome Res. 6:986-994 (1996), 

E. Pennica, D. et al., "WISP genes are members of the connective tissue growth factor 
family that are up-regulated in Wnt-1 -transformed cells and aberrantly expressed in human colon 
tumors," Proc. Natl. Acad. Sci. USA 95:14717-14722 (1998). 

F. Pitti, R.M. et al., "Genomic amplification of a decoy receptor for Fas ligand in lung and 
colon cancer," Nature 396:699-703 (1998). 

G. Bieche, I. et al., "Novel approach to quantitative polymerase chain reaction using real- 
time detection: Application to the detection of gene amplification in breast cancer," Int. J. Cancer 
78:661-666(1998). 

2. Declaration of Paul Polakis, Ph.D. under 37 C.F.R. §1.132. 

3. Declaration of Avi Ashkenazi, Ph.D. under 37 C.F.R. §1 . 132; with attached Exhibit A 
(Curriculum Vitae). 

4. Orntoft, T.F., et al., "Genome-wide Study of Gene Copy Numbers, Transcripts, and 
Protein Levels in Pairs of Non-Invasive and Invasive Human Transitional Cell Carcinomas," 
Molecular & Cellular Proteomics 1 :37-45 (2002). 

5 Hyman, E., et al., "Impact of DNA Amplification on Gene Expression Patterns in Breast 
Cancer," Cancer Research 62:6240-6245 (2002). 

-35- 

On Appeal to the Board of Patent Appeals and Interferences 

Appellants' Brief 
Application Serial No. 09/978,192 
Attorney's Docket No. 39780-2630 P1C9 



6. Pollack, J.R., et al., "Microarray Analysis Reveals a Major Direct Role of DNA Copy 
Number Alteration in the Transcriptional Program of Human Breast Tumors," Proc. Natl Acad. 
Sci. USA 99:12963-12968 (2002). 

7. Hanna, J.S., et al., "HER-2/neu Breast Cancer Predictive Testing," Pathology Associates 
Medical Laboratories (1999). 

8. Sen, S., "Aneuploidy and cancer " Curr. Opin. Oncol 12:82-88 (2000). 

9. Pennica, D. et al, "WISP genes are members of the connective tissue growth factor family 
that are up-regulated in Wnt-1 -transformed cells and aberrantly expressed in human colon tumors," 
Proc. Natl Acad. Sci. USA 95:14717-14722 (1998). 

10. Gygi, S. P. et al., "Correlation between protein and mRNA abundance in yeast," Mol Cell 
Biol. 19:1720-1730(1999). 

1 1 . Hu, Y. et al, "Analysis of genomic and proteomic data using advanced literature mining," 
Journal ofProteome Research 2:405-412 (2003). 

Items 1-3 were submitted with Appellants 1 Response filed September 14, 2004, and acknowledged 
as having been considered by the Examiner in the Office Action mailed July 19, 2005. 

Items 4-7 were made of record by Appellants in their IDS filed September 14, 2004. 

Items 8- 1 0 were made of record by the Examiner in the Office Action mailed May 20, 2004. 

Item 1 1 was made of record by the Examiner in the Office Action mailed July 19, 2005. 
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10. RELATED PROCEEDINGS APPENDIX 

None. 



SV 2178841 vl 

2/17/06 10:51 AM (39780.2630) 



-37- 

On Appeal to the Board of Patent Appeals and Interferences 

Appellants' Brief 
Application Serial No. 09/978,192 
Attorney's Docket No. 39780-2630 P1C9 




PATENT 

IN THE UNITED STATES PATENT AND TRADEMARK OFFICE 



in re /vppiicauon 01. /vsuKcnazi ei ai. 


VJTOUp nil UlUl. lOn/ 


Serial No.: 09/903,925 


Examiner: Fozia Hamid 


Filed: July 11, 2001 


^^M^^M'-^f'^^W^^^f MAILING v?f:'V" 

<1 hcreby'colify that flus •correspondence ts-being deposited with Ibc United .^*:^ 

^States Postal Service with' sufficient postage as;&st da^;roaD in -.M'eiivolope'rr-v 


For: SECRETED AND 

TRANSMEMBRANE 
POLYPEPTIDES AND NUCLEIC 
ACIDS 


^addrr^scd W; Aisistflnt Comraisaoncr of PatoilXvWasiimgtbEi, D.CL 2023 1 bn.^ 





DECLARATION OF AUDREY D. GODDARD, Ph.D UNDER 37 CF.IL 3 1.132 

Assistant Commissioner of Patents 
Washington, D.C 20231 

Sir: 

I, Audrey D. Goddard, Ph.D, do hereby declare and say as follows: 

1 . I am a Senior Clinical Scientist at the Experimental Medicine/BioOncology, Medical 
Affairs Department of Genentech, Inc., South San Francisco, California 94080. 

2. Between 1993 and 200 1,1 headed the DNA Sequencing Laboratory at the Molecular 
Biology Department of Genentech, Inc. During this time, my responsibilities included the 
identification and characterization of genes contributing to the oncogenic process, and determination 
of the chromosomal localization of novel genes. 

3 . My scientific Curriculum Vitae, including my list of publications, is attached to and 
forms part of this Declaration (Exhibit A). 



-1- 




Serial No.: * 
Filed:* 

4. I am familiar with a variety of techniques known in the art for detecting and 
quantifying the amplification of oncogenes in cancer, including the quantitative TaqMan PGR (i.e., 
"gene amplification") assay described in the above captioned patent application. 

5. The TaqMan PCR assay is described, for example, in the following scientific 
publications: Higuchi et aL 9 Biotechnology 10:413-417 (1992) (Exhibit B); Livak et aL, PCR 
Methods AppL 4:357-362 (1995) (Exhibit C) and Heid et al, Genome Res. 6:986-994 (1996) 
(Exhibit D). Briefly, the assay is based on the principle that successful PCR yields a fluorescent 
signal due to Taq DNA polymerase-mediated exonuclease digestion of a fluorescently labeled 
oligonucleotide that is homologous to a sequence between two PCR primers. The extent of 
digestion depends directly on the amount of PCR, and can be quantified accurately by measuring the 
increment in fluorescence that results from decreased energy transfer. This is an extremely sensitive 
technique, which allows detection in the exponential phase of the PCR reaction and, as a result, 
leads to accurate determination of gene copy number. 

6. The quantitative fluorescent TaqMan PCR assay has been extensively and 
successfully used to characterize genes involved in cancer development and progression. 
Amplification of protooncogenes has been studied in a variety of human tumors, and is widely 
considered as having etiological, diagnostic and prognostic significance. This use of the quantitative 
TaqMan PCR assay is exemplified by the following scientific publications: Pennica et al. Proc. 
Natl. Acad. Sci. USA . 95(25): 147 17- 14722 (1998) (Exhibit E); Pitti et al 7 Nature 
396(67 12):699-703 (1998) (Exhibit F) and Bieche etal. Int. J. Cancer 78:661-666 (1998) (Exhibit 
G), the first two of which I am co-author. In particular, Pennica et al have used the quantitative 
TaqMan PCR assay to study relative gene amplification of WISP and c-myc in various cell lines, 
colorectal tumors and normal mucosa. Pitti et al studied the genomic amplification of a decoy 
receptor for Fas ligand in lung and colon cancer, using the quantitative TaqMan PCR assay. Bieche 
et al used the assay to study gene amplification in breast cancer. 
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7. It is my personal experience that the quantitative TaqMan PGR technique is 
technically sensitive enough to detect at least a 2-fold increase in gene copy number relative to 
control. It is further my considered scientific opinion that an at least 2-fold increase in gene copy 
number in a tumor tissue sample relative to a normal (i.e., non-tumor) sample is significant and 
useful in that the detected increase in gene copy number in the tumor sample relative to the normal 
sample serves as a basis for using relative gene copy number as quantitated by the TaqMan PCR 
technique as a diagnostic marker for the presence or absence of tumor in a tissue sample of unknown 
pathology. Accordingly, a gene identified as being amplified at least 2-fold by the quantitative 
TaqMan PCR assay in a tumor sample relative to a normal sample is useful as a marker for the 
diagnosis of cancer, for monitoring cancer development and/or for measuring the efficacy of cancer 
therapy. 

8. I declare further that all statements made herein of my own knowledge are true and 
that all statements made on information and belief are believed to be true. I declare that these 
statements were made with the knowledge that willful false statements and the like so made are 
punishable by fine or imprisonment, or both, under Section 1001 of Title 18 of the United States 
Code, and that such willful false statements may jeopardize the validity of the application or any 
patent issuing thereon. 



Date 



Audrey D. Goddard, Ph.D. 



AUDREY D. GODDARD, Ph.D. 



110 Congo St. 

San Francisco, CA, 94131 

415.841.9154 

415.819.2247 (mobile) 

agoddard@pacbell.net 



PROFESSIONAL EXPERIENCE 

Gerientech, Inc. 1993-present 
South San Francisco, CA 

2001 - present Senior Clinical Scientist 

Experimental Medicine / BioOncology, Medical Affairs 

Responsibilities: 

• Companion diagnostic oncology products 

• Acquisition of clinical samples from Genentech's clinical trials for translational research 

• Translational research using clinical specimen and data for drug development and 
diagnostics 

• Member of Development Science Review Committee, Diagnostic Oversight Team, 21 CFR 
Part 11 Subteam 

Interests: 

• Ethical and legal implications of experiments with clinical specimens and data 

• Application of pharmacogenomics in clinical trials 



1998 -2001 Senior Scientist 

Head of the DNA Sequencing Laboratory, Molecular Biology Department, Research 
Responsibilities: 

• Management of a laboratory of up to nineteen -including postdoctoral fellow, associate 
scientist, senior research associate and research assistants/associate levels 

• Management of a $750K budget 

• DNA sequencing core facility supporting a 350+ person research facility. 

• DNA sequencing for high throughput gene discovery, - ESTs, cDNAs, and constructs 

• Genomic sequence analysis and gene identification 

• DNA sequence and primary protein analysis 

Research: 

• Chromosomal localization of novel genes 

• Identification and characterization of genes contributing to the oncogenic process 

• Identification and characterization of genes contributing to inflammatory diseases 

• Design and development of schemes for high throughput genomic DNA sequence analysis 

• Candidate gene prediction and evaluation 



Genentech, Inc. 
1 DNA Way 

South San Francisco, CA, 94080 

650.225.6429 

goddarda@gene.com 
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1993 - 1998 Scientist 

Head of the DNA Sequencing Laboratory, Molecular Biology Department, Research 
Responsibilities 

• DNA sequencing core facility supporting a 350+ person research facility 

• Assumed responsibility for a pre-existing team of five technicians and expanded the group 
into fifteen, introducing a level of middle management and additional areas of research 

• Participated in the development of the basic plan for high throughput secreted protein 
discovery program - sequencing strategies, data analysis and tracking, database design 

• High throughput EST and cDNA sequencing for new gene identification. 

• Design and implementation of analysis tools required for high throughput gene identification. 

• Chromosomal localization of genes encoding novel secreted proteins. 

Research: 

• Genomic sequence scanning for new gene discovery. 

• Development of signal peptide selection methods. 

• Evaluation of candidate disease genes. 

• Growth hormone receptor gene SNPs in children with Idiopathic short stature 

Imperial Cancer Research Fund 1989-1992 
London, UK with Dr. Ellen Solomon 

6/89 -12/92 Postdoctoral Fellow 

• Cloning and characterization of the genes fused at the acute promyelocytic leukemia 
translocation breakpoints on chromosomes 17 and 15. 

• Prepared a successfully funded European Union multi-center grant application 



McMaster University 

Hamilton, Ontario, Canada with Dr. G. D. Sweeney 
5/83 - 8/83: NSERC Summer Student 

• In vitro metabolism of p-naphthoflavone in C57BI/6J and DBA mice 



EDUCATION 

University of Toronto 

Toronto, Ontario, Canada. 1989 
Department of Medical 
Biophysics. 



Honours B.Sc 

"The in vitro metabolism of the cytochrome P-448 
inducer p-naphthoflavone in C57BL/6J mice." 
Supervisor: Dr. G. D. Sweeney 



Ph.D. 

"Phenotypic and genotypic effects of mutations in 
the human retinoblastoma gene." 
Supervisor: Dr. R. A. Phillips 



McMaster University, 

Hamilton, Ontario, Canada. 1983 
Department of Biochemistry 
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ACADEMIC AWARDS 



Imperial Cancer Research Fund Postdoctoral Fellowship 

Medical Research Council Studentship 

NSERC Undergraduate Summer Research Award 

Society of Chemical Industry Merit Award (Hons. Biochem.) 

Dr. Harry Lyman Hooker Scholarship 

J.LW. Gill Scholarship 

Business and Professional Women's Club Scholarship 
Wyerhauser Foundation Scholarship 



1989-1992 
1983-1988 
1983 



1983 



1981-1983 
1981-1982 
1980-1981 
1979-1980 



INVITED PRESENTATIONS 

Genentech's gene discovery pipeline: High throughput identification, cloning and 
characterization of novel, genes. Functional Genomics: From Genome to Function, Litchfield 
Park, AZ, USA. October 2000 

High throughput identification, cloning and characterization of novel genes. G2K:Back to 
Science, Advances in Genome Biology and Technology I. Marco Island, FL, USA. February 



Quality control in DNA Sequencing: The use of Phred and Phrap. Bay Area Sequencing 
Users Meeting, Berkeley, CA, USA. April 1999 

High throughput secreted protein identification and cloning. Tenth Internationa! Genome 
Sequencing and Analysis Conference, Miami, FL, USA. September 1998 

The evolution of DNA sequencing: The Genentech perspective. Bay Area Sequencing Users 
Meeting, Berkeley, CA, USA. May 1998 . _ 

Partial Growth Hormone Insensitivity: The role of GH-receptor mutations in Idiopathic Short 
Stature. Tenth Annual National Cooperative Growth Study Investigators Meeting, San 
Francisco, CA, USA. October, 1996 

Growth hormone (GH) receptor defects are present in selected children with non-GH-deficient 
short stature: A molecular basis for partial GH-insensitivity. 76 th Annual Meeting of The 
Endocrine Society, Anaheim, CA, USA. June 1994 

A previously uncharacterized gene, myl, is fused to the retinoic acid receptor alpha gene in 
acute promyelocytic leukemia. XV International Association for Comparative Research on 
Leukemia and Related Disease, Padua, Italy. October 1991 



2000 
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PATENTS 

Goddard A, Godowski PJ, Gurney AL NL2 Tie ligand homologue polypeptide. Patent 
Number: 6,455,496. Date of Patent; Sept. 24, 2002. 

Goddard A, Godowski PJ and Gurney AL. NL3 Tie ligand homologue nucleic acids. Patent 
Number: 6,426,218. Date of Patent: July 30, 2002. 

Godowski P, Gurney A, Hillan KJ, Botstein D, Goddard A, Roy M, Ferrara N t Tumas D, 
Schwall R. NL4 Tie ligand homologue nucleic acid. Patent Number: 6,4137,770. Date of 
Patent: July 2, 2002. 

Ashkenazi A, Fong S, Goddard A, Gurney AL, Napier MA t Tumas D, Wood Wl. Nucleic acid 
encoding A-33 related antigen poly peptides. Patent Number: 6,410,708. Date of Patent:: 
Jun. 25, 2002. 

Botstein DA, Cohen RL, Goddard AD, Gurney AL, Hillan KJ, Lawrence DA, Levine AJ, 
Pennica D, Roy MA and Wood Wl. WISP polypeptides and nucleic acids encoding same. 
Patent Number: 6,387,657. Date of Patent: May 14, 2002. 

Goddard A, Godowski PJ and Gurney AL Tie ligands. Patent Number: 6,372,491. Date of 
Patent: April 16, 2002. 

Godowski PJ, Gurney AL, Goddard A and Hillan K. TIE ligand homologue antibody. Patent 
Number: 6,350,450. Date of Patent: Feb. 26, 2002. 

Fong S, Ferrara N, Goddard A, Godowski PJ, Gurney AL, Hillan K and Williams PM. Tie 
receptor tyrosine kinase ligand homologues. Patent Number: 6,348,351 . Date of Patent: 
Feb. 19, 2002. 

Goddard A, Godowski PJ and Gurney AL. Ligand homologues. Patent Number: 6,348,350. 
Date of Patent: Feb. 19, 2002. 

Attie KM, Carlsson LMS, Gesundheit N and Goddard A. Treatment of partial growth 
hormone 1 insensitivity syndrome. Patent Number: 6,207,640. Date of Patent: March 27, 
2001. 

Fong S, Ferrara N, Goddard A, Godowski PJ, Gurney AL, Hillan K and Williams PM. Nucleic 
acids encoding NL-3. Patent Number: 6,074,873. Date of Patent: June 13, 2000 

Attie K, Carlsson LMS, Gesunheit N and Goddard A. Treatment of partial growth hormone 
insensitivity syndrome. Patent Number: 5,824,642. Date of Patent: October 20, 1998 

Attie K, Carlsson LMS, Gesunheit N and Goddard A. Treatment of partial growth hormone 
insensitivity syndrome. Patent Number: 5,646,1 13. Date of Patent: July 8, 1997 

Multiple additional provisional applications filed 
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PUBLICATIONS 

Seshasayee D, Dowd P, Gu Q, Erickson S, Goddard AD Comparative sequence analysis of 
the HER2 locus in mouse and man. Manuscript in preparation. 

Abuzzahab MJ, Goddard A f Grigorescu F, Lautier C, Smith RJ and Chernausek SD. Human 
IGF-1 receptor mutations resulting in pre- and post-natal growth retardation. Manuscript in 
preparation. 

Aggarwal S, Xie, M-H, Foster J, Frantz G, Stinson J, Corpuz RT, Simmons L, Hillan K, 
Yansura DG, Vandlen RL, Goddard AD and Gurney AL FHFR, a novel receptor for the 
fibroblast growth factors. Manuscript submitted. 

Adams SH, Chui C, Schilbach SL, Yu XX, Goddard AD, Grimaldi JC, Lee J, Dowd P, Colman 
S., Lewin DA. (2001) BFIT, a unique acyl-CoA thioesterase induced in thermogenic brown 
adipose tissue: Cloning, organization of the human gene, and assessment of a potential link 
to obesity. Biochemical Journal 360: 135-142. 

Lee J Ho WH. Maruoka M. Corpuz RT. Baldwin DT. Foster JS. Goddard AD. Yansura DG. 
Vandlen RL. Wood Wl. Gurney AL. (2001) IL-17E, a novel proinflammatory ligand for the IL- 
17 receptor homolog IL-17RM. Journal of Biological Chemistry 276(2): 1660-1664. 

Xie M-H, Aggarwal S, Ho W-H, Foster J, Zhang Z, Stinson J, Wood Wl, Goddard AD and 
Gurney AL. (2000) Interleukin (IL)-22, a novel human cytokine that signals through the 
interferon-receptor related proteins CRF2-4 and IL-22R. Journal of Biological Chemistry 275: 
31335-31339. 

Weiss GA, Watanabe CK, Zhong A, Goddard A and Sidhu SS. (2000) Rapid mapping of 
protein functional epitopes by combinatorial alanine scanning. Proc. Natl. Acad. Sci. USA 97: 
8950-8954. 

Guo S, Yamaguchi Y, Schilbach S, Wada T.;Lee J, Goddard A, French D , Handa H, 
Rosenthal A. (2000) A regulator of transcriptional elongation controls vertebrate neuronal 
development. Nature 408: 366-369. 

Yan M, Wang L-C, Hymowitz SG, Schilbach S, Lee J, Goddard A, de Vos AM, Gao WQ, Dixit 
VM. (2000) Two-amino acid molecular switch in an epithelial morphogen that regulates 
binding to two distinct receptors. Science 290: 523-527. 

Sehl PD, Tai JTN, Hillan KJ, Brown LA, Goddard A, Yang R t Jin H and Lowe DG. (2000) 
Application of cDNA microarrays in determining molecular phenotype in cardiac growth, 
development, and response to injury. Circulation 101: 1990-1999. 

Guo S, Brush J, Teraoka H, Goddard A, Wilson SW, Mullins MC and Rosenthal A. (1999) 
Development of noradrenergic neurons in the zebrafish hindbrain requires BMP, FGF8, and 
the homeodomain protein soulless/Phox2A: Neuron 24: 555-566. 

Stone D, Murone, M, Luoh, S, Ye W, Armanini P, Gurney A, Phillips HS, Brush, J, Goddard 
A, de Sauvage FJ and Rosenthal A. (1999) Characterization of the human suppressor of 
fused; a negative regulator of the zinc-finger transcription factor Gli. J. Cell Sci. 112: 4437- 
4448.' 

Xie M-H, Holcomb I, Deuel B, Dowd P, Huang A, Vagts A, Foster J, Liang J, Brush J, Gu Q, 
Hillan K, Goddard A and Gurney, A.L. (1999) FGF-19, a novel fibroblast growth factor with 
unique specificity for FGFR4. Cytokine 11: 729-735. 
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Yan M, Lee J, Schilbach S, Goddard A and Dixit V. (1999) mE10, a novel caspase 
recruitment domain-containing proapoptotic molecule. J. Biol. Chem. 274(15): 10287-10292. 

Gurney AL, Marsters SA, Huang RM, Pitti RM, Mark DT, Baldwin DT, Gray AM, Dowd P, 
Brush J, Heldens S, Schow P, Goddard AD, Wood.WI, Baker KP, Godowski PJ and 
Ashkenazi A. (1999) Identification of a new member of the tumor necrosis factor family and its 
receptor, a human ortholog of mouse GITR. Current Biology 9(4): 215-218. 

Ridgway JBB, Ng E t Kern JA ,Lee J, Brush J, Goddard A and Carter P. (1999) Identification 
of a human anti-CD55 single-chain Fv by subtractive panning of a phage library using tumor 
and nontumor cell lines. Cancer Research 59: 2718-2723. 

Pitti RM t Marsters SA, Lawrence DA, Roy M, Kischkel FC, Dowd P, Huang A, Donahue CJ, 
Sherwood SW, Baldwin DT, Godowski PJ t Wood Wl, Gurney AL, Hillan KJ, Cohen RL f 
Goddard AD, Botstein D and Ashkenazi A (1998) Genomic amplification of a decoy receptor 
for Fas ligand in lung and colon cancer. Nature 396(6712): 699-703. 

Pennica D, Swanson TA, Welsh JW, Roy MA f Lawrence DA, Lee J, Brush J, Taneyhill LA, 
Deuel B, Lew M, Watanabe C, Cohen RL, Melhem MF, Finley GG, Quirke P, Goddard AD, 
Hillan KJ, Gurney AL, Botstein D and Levine AJ. (1998) WISP genes are members of the 
connective tissue growth factor family that are up-regulated in wnt-1 -transformed cells and 
aberrantly expressed in human colon tumors. Proc. Natl. Acad. Sci. USA. 95(25): 14717- 
14722. 

Yang RB, Mark MR, Gray A, Huang A, Xie MH, Zhang M, Goddard A, Wood Wl, Gurney AL 
and Godowski PJ. (1998) Toll-like receptor-2 mediates lipopolysaccharide-induced cellular 
signalling. Nature 395(6699): 284-288. 

Merchant AM, Zhu Z t Yuan JQ, Goddard A, Adams CW, Presta LG and Carter P. (1998) An 
efficientvroute to human bispecific IgG. Nature Biotechnology 16(7): 677-681. 

Marsters SA, Sheridan JP, Pitti RM, Brush J, Goddard A and Ashkenazi A. (1998) 
Identification of a ligand for the death-domain-containing receptor Apo3. Current Biology 8(9): 
525-528. 

Xie J, Murone M ? Luoh SM f Ryan A, Gu Q, Zhang C, Bonifas JM, Lam CW, Hynes M, 
Goddard A, Rosenthal A, Epstein EH Jr. and de Sauvage FJ. (1998) Activating Smoothened 
mutations in sporadic basal-cell carcinoma. Nature. 391(6662): 90-92. 

Marsters SA, Sheridan JP, Pitti RM, Huang A, Skubatch M, Baldwin D, Yuan J, Gurney A, 
Goddard AD, Godowski P and Ashkenazi A. (1997) A novel receptor for Apo2L/TRAIL 
contains a truncated death domain. Current Biology. 7(12): 1003-1006. 

Hynes M, Stone DM, Dowd M, Pitts-Meek S, Goddard A, Gurney A and Rosenthal A. (1997) 
Control of cell pattern in the neural tube by the zinc finger transcription factor Gli-1. Neuron 
19: 15-26. 

Sheridan JP, Marsters SA, Pitti RM, Gurney A, Skubatch M, Baldwin D, Ramakrishnan L, 
Gray CL, Baker K, Wood Wl, Goddard AD, Godowski P, and Ashkenazi A. (1997) Control of 
TRAIL-lnduced Apoptosis by a Family of Signaling and Decoy Receptors. Science 277 
(5327): 818-821. 
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Goddard AD, Dowd P, Chernausek S, Geffner M f Gertner J, Hintz R, Hopwood N, Kaplan S, 
Plotnick L, Rogol A t Rosenfield R, Saenger P, Mauras N, Hershkopf R, Angulo M and Attie, K. 
(1997) Partial growth hormone insensitivity: The role of growth hormone receptor mutations in 
idiopathic short stature. J. Pediatr. 131: S51-55. 

Klein RD, Sherman D, Ho WH, Stone D f Bennett GL, Moffat B, Vandlen R t Simmons L, Gu Q, 
Hongo JA, Devaux B, Poulsen K, Armanini M, Nozaki C, Asai N, Goddard A, Phillips H, 
Henderson CE, Takahashi M and Rosenthal A. (1997) A GPI-linked protein that interacts with 
Ret to form a candidate neurturin receptor. Nature. 387(6634): 717-21. 

Stone DM, Hynes M, Armanini M, Swanson TA, Gu Q, Johnson RL, Scott MP, Pennica D, 
Goddard A, Phillips H, Noll M, Hooper JE, de Sauvage F and Rosenthal A. (1996) The 
tumour-suppressor gene patched encodes a candidate receptor for Sonic hedgehog. Nature 
384(6605): 129-34. 

Marsters SA, Sheridan JP, Donahue CJ t Pitti RM, Gray CL, Goddard AD, Bauer KD and 
Ashkenazi A. (1996) Apo-3, a new member of the tumor necrosis factor receptor family, 
contains a death domain and activates apoptosis and NF-kappa p. Current Biology 6(12): 
1669-76. 

Rothe M, Xiong J, Shu HB, Williamson K, Goddard A and Goeddel DV. (1996) l-TRAF is a 
novel TRAF-interacting protein that regulates TRAF-mediated signal transduction. Proc. Natl. 
Acad. ScL USA 93: 8241-8246. 

Yang M f Luoh SM, Goddard A, Reilly D, Henzei W and Bass S. (1996) The bglX gene 
located at 47.8 min on the Escherichia coli chromosome encodes a periplasmic beta- 
glucosidase. Microbiology 142: 1659-65. 

Goddard AD and Black DM. (1996) Familial Cancer in Molecular Endocrinology of Cancer. 
Waxman, J. Ed. Cambridge University Press, Cambridge UK, pp. 187-21 5. 

Treanor JJS, Goodman L, de Sauvage F, Stone DM, Poulson KT, Beck CD, Gray C, Armanini 
MP, Pollocks RA, Hefti F, Phillips HS, Goddard A, Moore MW, Buj-Bello A, Davis AM, Asai N, 
Takahashi M, Vandlen R, Henderson CE and Rosenthal A. (1996) Characterization of a 
receptor for GDNF. Nature 382: 80-83. 

Klein RD, Gu Q, Goddard A and Rosenthal A. (1996) Selection for genes encoding secreted 
proteins and receptors. Proc. Natl. Acad. ScL USA 93: 7108-71 13. 

Winslow JW, Moran P, Valverde J, Shih A, Yuan JQ, Wong SC, Tsai SP, Goddard A, Henzei 
WJ, Hefti F and Caras I. (1995) Cloning of AL-1, a ligand for an Eph-related tyrosine kinase 
receptor involved in axon bundle formation. Neuron 14: 973-981. 

Bennett BD, ZeiglerFC, Gu Q, Fendly B, Goddard AD, Gillett N and Matthews W. (1995) 
Molecular cloning of a ligand for the EPH-related receptor protein-tyrosine kinase Htk. Proc. 
Natl. Acad. Sci. USA 92: 1866-1870. 

Huang X t Yuang J, Goddard A, Foulis A, James RF, Lemmark A, Pujol-Borrell R, 
Rabinovitch A, Somoza N and Stewart TA. (1995) Interferon expression in the pancreases of 
patients with type I diabetes. Diabetes 44: 658-664. 

Goddard AD f Yuan JQ, Fairbairn L, Dexter M, Borrow J, Kozak C and Solomon E. (1995) 
Cloning of the murine homolog of the leukemia-associated PML gene. Mammalian Genome 
6:732-737. 
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Goddard AD, Covello R t Luoh SM, Clackson T, Attie KM, Gesundheit N t Rundle AC, Wells 
JA, Carlsson LMTI and The Growth Hormone Insensitivity Study Group. (1995) Mutations of 
the growth hormone receptor in children with idiopathic short stature. N. Engl. J. Med. 333: 
1093-1098. 

Kuo SS, Moran P, Gripp J, Armanini M, Phillips HS, Goddard A and Caras IW. (1994) 
Identification and characterization of Batk f a predominantly brain-specific non-receptor protein 
tyrosine kinase related to Csk. J. Neurosci. Res. 38: 705-715. 

Mark MR, Scadden DT. Wang Z f Gu Q, Goddard A and Godowski PJ. (1994) Rse, a novel 
receptor-type tyrosine kinase with homology to Axl/Ufo t is expressed at high levels in the 
brain. Journal of Biological Chemistry 269: 10720-10728. 

Borrow J, Shipley J, Howe K, Kiely F, Goddard A, Sheer D, Srivastava A, Antony AC, 
Fioretos T, Mitelman F and Solomon E. (1994) Molecular analysis of simple variant 
translocations in acute promyelocyte leukemia. Genes Chromosomes Cancer 9: 234-243. 

Goddard AD and Solomon E. (1993) Genetics of Cancer. Adv. Hum. Genet 21: 321-376. 

Borrow J, Goddard AD, Gibbons B, Katz F, Swirsky D, Fioretos T, Dube I, Winfield DA, 
Kingston J, Hagemeijer A, Rees JKH, Lister AT and Solomon E. (1992) Diagnosis of acute 
promyelocyte leukemia by RT-PCR: Detection of PML-RARA and RARA-PML fusion 
transcripts. Br. J. Haematol. 82: 529-540. 

Goddard AD, Borrow J and Solomon E. (1992) A previously uncharacterized gene, PML, is 
fused to the retinoic acid receptor alpha gene in acute promyelocyte leukemia. Leukemia 6 
Suppl3: 117S-119S. 

Zhu X, Dunn JM, Goddard AD, Squire JA, Becker A, Phillips RA and Gallie BL. (1992) 
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We have enhanced the polymerase chain 
reaction (PGR) such that specific DNA 
sequences can be detected without open- 
ing the reaction tube* This enhancepaefitt 
requires the addition of etiudium bromide 
(EtBr) to a PCR- Since the fluorescence of 
EtBr increases in the presence of double- 
stranded (ds) DNA an increase in fluores- 
cence in such a PGR indicates a positive 
amplification, which can be easily moni- 
tored externally. In fact, amplification can 
be continuously monitored in order to 
follow its progress. The ability to simulta- 
neously amplify specific DNA sequences 
and detect the product of the amplification 
both simplifies and improves PGR and 
may facilitate its automation and more 
widespread use in the clinic or in other 
situations requiring high sample Through- 
put 

Although the potentisat^nc6.ts of PCR X to, clin- 
ical diagnostics arc weU known?- 9 , it is still not 
widely used in this setting, even though it is 
four year* ciuco thcrm^ttble DMA poffm^ 

ases 4 made PCR practfcaL Some of the rfeaabns for its slow, 
acceptance are high cost, tack of automation of pre- and 
post-PCR processing steps, and false positive results, from 
carryover-contamination. The first two points are related 
in that labor is the largest contributor to cost it the presen t 
itage of PCR development* Most Current assays require 
some form of "downstream" processing once tbetmocy* 
ding k done in order xis determine whether the target 
DNA sequence was* present and has amplified, These 
include DNA hybridiwtioxr**, gel electrophoresis with or 
vithout use of restriction digestion***,' H PIX?, or capillary 
electrophoresis 10 * These methods are labor-intense, have, 
low throughput, and are difficult to automate. The third 
point is abo closely related to downstream processing. 
The handling of the FCR product in these downstream 
processes increases die chances that amplified DNA .will 
spread through the typing- lab, resulting in a .risk of 



l carryover w false positives in subsequent testing 11 . 

These downstream processing steps would be elimi- 
nated if specific aniplifijcation and detection of amplified 
DNA took place simultaneously within an unopened re- 
action vessel Assays in which such different processes take 
place without, the need to separate reaction components 
have been termed , 1^mogeiteous'\ No truly hbmogc-. 
tieous PGR assay has been demonstrated to date, although 
progress towards this end has been reported. Chehab, et 
aL 1 *, developed a PCR product detection scheme using 
fluorescent primers that resulted in a fluorescent PCR 
product AUelc<;peeific primers, each with different fluo- 
rescent tags, were used to indicate the genotype of the 
DNA. However, the unincorporated primers must still be 
removed in a do wnstream process in order to vfcuahV-c the 
result Rcccndy, Holland, et al, 15 , developed an assay in 
which the endogenous 5 r exonudeas* assay of T<tf DNA 
jpotynierase was exploited to cleave a labeled oligonucleo- 
tide probe. The probe would only ctave if PCR arnpC6- 
cation had produced its complementary sequence. In 
crder to detect the cleavage pt^ucts, however, a subse- 
quent process is again needed* 

We have developed a truly hoorogeneou* assay for PGR 
and PGR product detection based upon tbc gready in- 
creased fluorescence that ethldium broinMe and other 
DNA binding dyes exhibit when they are bound to. ds- 
DNA 1 ^ 19 . As outhncd in Figure I, a prototypic PCR 
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RQV8E I IVituiplc of simultaneous ampBfic&tioa a od delect ton of 
PCR product; The components of a PCRcootMnhtf Et&r dial arc 
6uorescentarelisted^tBrtt^f,EtBrbQUi^toe^^ 
dsDN A Tberc Is * Jar^c 6uorcs<cocc cnhanrrqicat when EtBr Is 
bound to DNA and touting is greatly enhanced when DNA .is 
doublc-straridcd. After sufident (n)..cydcs of PGR* the.net 
increase m d^pNA results in additional EtBr binding, and £ net 
increase in total iluaxcscencc- 
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WCOTtff % Gei eicctroplwreris of PCS. amplification products of the 
human, tiudcar gene, HLA DQti, made ia the pretence of 
increasing amounts of EtBr (up to 8 ^g/ml). The presence of 
EUJr lias do obvious effect on utc yield or specificity of amplifi- 



A. 





RtrGtf % (A) Ffcoccsce&ce measurement* from PCRs that cootaia 
0.5 pgfail EtBr and that ^re specific for V<]>fotno$Oxbc repeat 
$emteticet. Five rcptica&e PGR* were begun contafouigcaeh of the 
DNA* specified. At each indicated cyde, one of the five rcplkatc 
PCRs for each DNA -was removed from thcrmocydxng and Hs 
fluorescence measured. Unit* of fluorescence arc arbitrary. (B) 
UV photography of PC R tube* (0,5 ml EppcDdorf^ryic, pcHyprO- 
pykrve mtcro-ceMr&sre'-tubcfi) conttkung reactions, those sta.ro. 
inz from 2 ng male DNA and control reactions without any DNA, 
from (A), 
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begins with primers that arc sirigle-straoded DNA (ss- 
J>NA), drTTPs, and DNA polymerase; An amount of 
dsDNA containing the target sequence' (target DNA) if 
also typically present. This amount can vary, depending 
on the application, from single-cell amounts of DNA lT to 
micrograms per PCR* 8 . If EtBr is present, the reagents 
that will fluoresce, in order of increasing fluorescence, are 
free EtBr itself, and EtBr bound to the single-stranded 
DNA primers and to the double-stranded target DNA (bf 
its intercalation between the stacked bases of the DNA 
doubk-hefix). After Che first denaturation cyde, target 
DNA will be largely single-stranded. After a FCR h 
completed, the most atgniheant change is the increase in 
the amount of dsDNA (the PGR product itself) of up to 
several tnicrdgr^ms- Formerly free EtBr is bound to the 
additional dsDNA* resulting in an increase in fluores- 
cence. There is also some decrease in the amount of 
ssDNA primer, but because tbe binding of EtBr to ssDNA 
ts much Jc$$ than to dsDNA, the effect of this change on 
the total fluoirsccucc of the sample is small* The fluores- 
cence increase can be measured by directing c^otation 
illumination through the walls of the amplification vessel 
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before and after, or even continuously during, thermocy. 
ding. 

RESULTS 

PGR in the presence of EtBr. fa order to assess the 
affect of EtBr tp FOR, amplifications of the hurnan HU\ 
DQa gene 19 were performed with the dye present a« 
concentrations from 0,06 to 8.0 jtgfau (a typical conceo- 
tradon of EtBr used tn staining of nucleic aads following 
gel electrophoresis is 0.5 i^g/rot). As shown in Figure 2,-ge} 
electrophoresis revealed little or no difference in the yield 
or quality of the amplication product whether EtBr was 
absent or present at any of these concentratiorjs, indicat- 
ing that EtBr does not inhibit PCR, 

Detection of hnmaa Y-dm>mowm« specific se- 
quences. Seo^nce^peartc, fluorescence enhancement of 
EtBr as a result of PGR was demonstrated in a scries of 
amplifications containing 0,5 jxgfal EtBr and primer* 
specific to repeat DNA sequences found on the -human 
Y-chromosomc* 01 - These PCRs initially contained either 
60 ng male, 60 ng female, 2 ng mate human or no DNA. 
Five replicate PCRs were begun for each DNA* After 0, 
IV, Zl, 24 and 29 cydes of therniocydirig, aFCR for each 
DNA was removed from the tbermocyder t and its fluo- 
rescence measured in a spcctroJraorometer and plotted 
Vs. amplification cyde number (Ftg. 3 A), The shape of this 
curve rcSects the fact that by the tune an increase in 
fluorescence can be detected, the increase in DNA is 
becoming linear and not exponential with cycle number: 
As shown, the fluorescence increased about three-fold 
the background fluorescence for the PCRs contain- 
ing human male DNA, but did not significantly increase 
for negative control PCRs, which contained either no 
DNA or human female. DNA. The more male DNA 
present to begin whb— 60 ng versus 2 ng— the fewer 
.cycles were needed to give a detectable increase in fluo- 
rescence. Gel eiectropnorests oo the products of these 
amplifications showed that DNA fragments of the ex- 
pected skc were made in the male DNA containing 
reactions and that Me DN A synthesis took place in the 
control samples. 

In addition, the increase in fluorescence was visualized 
by shnply laying the completed* unopened PCRs on a UV 
tran$ShirniriatOT and photographing them through a red' 
filter. This is shown in figure SB lor the reactions thai 
began with 52 ng mate DNA and those with no DNA* 

Detection of specific allele* of tbe human fl-globin 
gene. In order to demonstrate that this approach has 
adequate specificity to allow genetk screening* a dttcaion 
of die' SKWc-odl anemia mutation was performed- Figure 
4 shows the fluorescence from completed amplication* 

containing EtBr (0.S |AgfaaI) *$ detected by photography 

of the reaction cubes on a UV transuloniinator. These 
reactions were performed using^ pruticr* specific for ci- 
ther the- w3d-tvpe or sickle-ceil mutation of the human 
p-gtobin gene* \ The specirkity for each allele is imparted 
by placing the sickle-mutation site at the terminal 3' 
nucleotide of one primer. By using an stpproprjate primer 
annealing temperature, primer extcxision — and thus am- 
pUtotipn— can take place only if the 5' nucleotide of the 
primer is complementary to the allele present*'^. 

Each pair of amplications shown in Figure 4 consists of 
a reaction wub cither the wikUype allele spedfic (left 
tube) or skklc-allde specific (right tube) primers. Three 
different DNAs were typed: DNA from a homozygous, 
wiM-type ^globin individual (AA); from a heterozygous 
sickle p-globtn individual (AS); and from a homozygous 
sickle p-gloWo mdrvidual'(SS). Each DNA (50 ng genomic 
DNA to start each PGR) was analyzed m tnpficafc (3 pairs 
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0 f reactions each). The DNA .type -was reflected in the 
jefcuve fluorescence intensities in each pair of competed 
fl{n p|tfication6. There was a significant increase in fluores- 
cence only where a fr^globin allele DNA matched the 
pri/Bcr act. When measured- oa a spectrofloorometer 
{J^ta not shown), this fluorescence was about three times 
tjiat present in, a PCR where both 0-globm alleles were 
^isinatchcd to the primer sec Gel electrophoresis (not 
phown) established that this increase in fluorescence was 
due to the synthesis of nearly a microgram of a DNA 
fragment of the expected size for (i-globin. There was 
fade synthesis of dsDNA in reactions in. which the allek- 
jpcdftc primer was mismatched to both alleles. 

Continuous m<mitoring of a PCR. 'Using a fiber optic 
devkcru is possible to direct excitation uJuminauoti from 
$ spoctrofl uoromete r to a PCR undergoing thcrmocyding 
and to return its fluorescence to the ispectroftuorometer. 
The fluorescence readout of such an arrangement, di- 
rected at an EtBr-concaiatng amplification of Y-chromo- 
some specific scqveoccs from 25 ng of toman male DNA, 
b shown in Figure 5. The readout from a control r*CR 
with no target DNA is also shown. Thirty cycles of PCR 
were monitored for cach- 

The fluorescence trace as a function of time dearly 
shows the effect of the thetmocyding. Fluorescence inten- 
sity riser and falb mYersciy with temperature. The fluo- 
rescence intensity is minimum at the denaturation tem- 
perature ({H°Q and maximum at the axineatm^extension 
temperature (SOX). In the negative-control PCR, these 
fluorescence maxima and minima do not change, signifi- 
cantly over the thirty mcxraocycks, indicating that there is 
little dsDNA synthesis without the appropriate target 
DNA, and there b .little if any Meaehing of EtBr during 
(he continuous illuminatkm of the sample. 

In the PCR containing male DNA, the fluorescence 
maxima at the annealingfextension temperature begin to 
increase at about 4000 seconds' of thcrmocychng, and 
continue to increase with time, indicating that dsDNA is 
being produced at a detectable level Note that the fluo- 
rescence minima at the denaturatiou temperature do not 
aigtiificandy increase* presumably because at this temper- 
ature there is no dsDNA for Etfir to bind. Thus the course 
of the amplification is followed by tracking the ftuorcs-. 
cence increase at the annealing temperature. Analysis of 
ihc products of these two amplifications by gel electro pho- 
rcpi? showed a DNA fragment of the eaqpectcd size for the 
mate DNA containing sample and no detectable DNA 
syndiesi* for the control sample* 

DISCUSSION 

Downstream processes such as- hybridization to a se- 
queojce-«pecific probe can enhance die specificity of DNA 
dectutlvn Uy PGR. The cHmioatkm of dicac processes 
means that' the specificity of this homogeneous assay 
depends solely on that of tC(L In the case of dckle-cell 
dltca&e, we have shown that PGR alone has sufficient DNA 
sequence specificity to permit genetic screening. Using 
appropriate amplification conditions, there is little non- 
specific production of dsDNA in the- absence of the 
appropriate target allele. 

The speoficuy required to detJcct pathogens can be 
more or less than that required to do generic screening, 
depending on the number of pathogens in the sample and 
the amount of other DNA that must be taken with the 
sample. A difficult target fa HIV, -which requires detection 
of a viral genome that can be at the level of a few copies 
per thousands of host cells*. Compared with genetic 
Xrexining, which is performed or ceils srontainrag at least 
one copy of die target -sequence* HIV ^detection requires 
hoth more specHi&ty and the input of mote total 
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RWfc 4 ITV photography of PCR tubes containing fum^caixoiu 
using EtBr that are jpectttc to wikJtapc ( A ) OT (5> alleles of 
the human p-globin gene, The left of each 1^ of tubes contains 
aOek-tpcdfic primer* to the wild-type alleles, the right tube 
primers to the skWe aJQete. The ohmograph was tafxb aficr 50 
cycles of PCX . and the input DMAs and the alleles they contain 
are indicated- Fifty tog of DNA was used to begin PGR, Typtng 
was done in triplicate (3 patt* of PGfe) for each input DNA: 
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RGTOE 5 Continuous, realtime monitoring of a PCR. A fiber optic 
was wed to carry- excitation Kgfr to 3 FX2R to progress aad abo 
cmKt^d iight back to a fioorometcr (see Exoentncatal Ptotoeol). 
AmpHficadoo U4iiog hutnan n»do-DNA specific primcn in » PCR 
jtirting with tO ng of human male DNA {topK or iu x control' 
PCR without pNA (Doitom). were roonhorrd. Thirty cydes of 
PCR were foJJowed for each. The* temperature cycled between 
94*C (dinaturatkm) and 50°C (annealing and extension). Note in 
the male DNA PCR, the cyde (tunc) depeflrfcot inoxasc in 
fluorescence at the soiieaBn^extenscon teafpetaturc. . 
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DN A-— up to microgram amounts — in order to have suf- j 
ficicnt numbers of target sequences, This large amount of 
^taxiing DNA m an amplication si^iftcantly increases 
the background fluorescence over which any additional 
fluorescence produced by PGR must be detected. An 
additional complication that ocean whh Urges in tow 
copy-number is the formation of the "primer-dimeV 
artifact This is the result of the extension of one primer 
using the other primer as a template. Although this occurs 
infrequently, once it occur* the extension product is a 
substrate for PGR amplification, and on compete whh 
true PGR targets if those targets are rare *f lie primer- 
dimcr product is of course dsDNA and thus is a potential 
source of false signal in this homogeneous a*$ay. 

To increase FCR specificity and reduce the effect of 
primer-dimcr anlpUncation, we are mvesdgating a num- 
ber of approaches, including the use of nested-primer 
amplifications that take place in a single tube 8 , and the 
♦"hot-starT, in which nonspecific amplification is reduced 
by raising the temperature of the reaction before DNA 
synthesis begins 85 . Preliminary results using these ap- 
proaches suggest that prjjjicr^imeT b effectively reduced 
and it is possible to detect the increase in Etfit fluores- 
cence in a PGR instigated by a single HIV genome in a 
background of 10* celts. \Vith larger .numbers of cells, the 
background fluorescence contributed by genomic DNA 
becomes problematic. To reduce this background, it may 
be possible to use sequence-sped fic DNA-binding dyes 
that can be made to^wrcferenOaJly bind PGR product over 
genomic DNA by mcorporating the dye-binding DNA 
sequence into the PGR product through a 5' "add-on" to . 
the oligonucleotide primer". 

We have shown that the detection of fluorescence 
generated by an EtBr-contaming PGR is straightforward, 
both once PGR is completed and continuously during 
thermocyefing. The ease with which automation of spe- 
cific DNA detection can be accomplished is the most 
promising aspect of this -assay. The fluorescence analysts 
of completed PCRs is ^rcadypossiblc with existing msmi- 
mentation in 96-well format^. In tlus format* the fluores- 
cence in each PGR can be quantitated before, after, and 
even at selected points during thermocyciing by moving 
the rack of PCRs to a 9$-rmcrowcll plate fluorescence 
reader 46 . . 

The instrumentation necessary to continuously monitor 
multiple PCRs simultaneously is also simple in principle. 
A direct extension of the apparatus used here is to have 
multiple fiberopdes transmit the excitation light and flu- 
orescent emissions to and from multiple PCRs. The ability 
to monitor multiple PCRs continuously may allow quan- 
titation of target DNA copy number. Figure 3 shows that 
the larger the amount of starting target DNA, the sooner 
during PT.R a fluorescence increase is detected. Prdinii- 
nary experiments <Higuchi and DoUinger, manuscript in 
preparation) with continuous monitoring have shown a 
sensitivity to two-fold differences in initial target DNA 
concentration. 

Conversely, if the number of target molecules is 
known — as it can be in genetic screening-^continuoiis 
monitoring may provide a means pf detecting fatoc posi- 
tive and false negative result* With -a known number of 
target molecules, a true positive would exhibit detectable 
fluorescence by a predictable number of cycles of PGR. 
Increases in fluorescence detected before or after that 
cycle would indicate potential artifacts. False negative 
results due to, for example,. inhibition of DNA polymer- 
ase, may be detected by including within each PCR an 
inefficiently amplifying marVcr. This marker results in a 
fluorescence increase only after a large number of cy- 
cles—many more' than are necessary to detect a true 



positive. If a sample fails to have a fluorescence increase 
after this many cycles, inhibition may be suspected. Since, 
in this assay, conclusions are drawn based on the presence 
or absence of fluorescence signal alone, such controls may 
be important. In any event, before any test based On this 
principle is ready for the dink, ah assessment of its false 
positive/false negative rates will need to be obcained' using 
a large number of known sanjplcs. 

In summary, the inclusion in PGR of dyes whose fluo- 
rescence is enhanced upon binding dsDNA makes it 
possible to detect specific DNA amplification from outside 
the PGR tube. In the future, instruments based upon this 
principle may facilitate the more widespread use of PCR 
in applications that demand the high throughput of 
sample*. 

EXPERIMENTAL PROTOCOL 

Human HLA-DQw gen* *mpUScaUotks containing EtBr. 
PCRs were set up iu 100 f4 volumes containing 10 mM Tris-HO, 
pH 8.3; 50 mM KC1; 4 raM MgCfe 2-5 unit* of Taa DNA 
polymerase (Perkm-Ehncr Ccuu. Norwalk, CT); 20 ptriole each 
of human HtA-DQa ' gene specific oligonucfcocide primers 
(>H26 and CH27 W and approoomately \<P copies of DQa PCR 
product diluted from a previous reaction. Ethidiura bromide 
(EtBr; SigtriA) was wed M the concentrations indicated to Figure 
2. Theimocyding proceeded for 20 cydes tA a model 480 
thcrmocydcr (Perfcn-EJmcr Ccm*, Norwalk, CT) uanga -ftcp- 
cydc" program of 94*C for I mia. denaturation and GOV- for w 
sec. anncalmg and 72*C for SO sec extension. 

Y-chromosKnnc specific PCR. PCRs (100 ul total reaction 
volume) containing yzfcai EtSr were prepared as described 
for HLA-DQo, except vich different primers and target DNAs. 
These PCRs contained 1 5 pmofc each male DN A-spccuic printer 
YI. 1 and V I.2 M , arid cither 60 ng male, 60 ogfemale, 2 rig mak. 
or no human 1>N A ThcrmOcyding wa* 94*C Tor 1 win- and 6^C 
for 1 min using a "step-cycle* program. The number of cycks for 
a sample were as indicated in rlgufe 3. fluorescence measure- 
ment l* described below. % 

AUck-apccific, human p-glotnn pew* PCR, Amplincauons of 
100 fd volume' 0-5 m/ml of £tBr were prepared a* 

described for HLA-DQt* above except with different primers and 
target DNAfi. These PCtts contained eUher primer pair HGYtt 
TO HA <wgdHype globin specinc primeni) or HCmipi4S («<*- 
Ic-gtofain ipeofic primers) at 10 pmoie &ch Pf^ rcR : 
These primers were deeped by Wu ct aL 21 . Three diucrent 
target DNA* were used in separate amplificauona— 50 ng each of 
human DNA that was homozygous for the ikkk trait (SS), DNA 
that was hetctorrgous for the sickle trait (AS), or DNA that *raa 
honvwerg^us for the w.l- £lobm (AA). Thermocyefing w** PotSO 
cycles at 94X for 1 mm. and S5*C for 1 min. usto| a "sacjMTck 
program. Att anneafiog Winpcniture of 55X:b^hccn shown hy 
Wu et al 21 to provide alldc^riccmc awplUKatwn. Cwnplcted 
PCRs were phoWaphed through a red mor (Writicn^A) 
after pladng ? the r?ac&n tubes acop a model TM-36 (wnaffluflfti- 
nator (U V-producu SanG«bricl, CA). 

Fhioresccnce measoriaaetiL nigrescence measurement wcw 
mad> oh PCRs containing EtBr in a nuofolog.2 uttorometer 
(SFEX EdUon. NJ) : txcitatwn was at the SOOnrn band I ^ 
ibom t ntn bandwidth with a GO nm ^^ y ^&g% 
Crist, lac. Irvine. CA) to exclude scW-order ^ 
Ught was detected M 570 nm with a bandwidth of about 7 nrn. An 
CX> 530 urn cut-nffBlter was used to remove the exotauoo lujjit- 
ConUtHtom mtorewence xmmHormg of PCR, Continuous 
monitoring of a PCR in progress was aaxjmplisbed using me 
l^trouufrometer and setting* described above as wet as a 
f&OTptic accessory (SPJEX cat no. 1950) mbotfv send exctfaoon 
light m. aod teeeivc emitted light from, a PCR pJaccd m aweUof 
a modd 490 ihermocyder (Pcrkiu-Elmer Ceiw*). The probe end 
of the fiberoptic cable was attached Wolx "5 nvoutc-cpoxy mtjte 
open top of a PCR tube (a 0.5 ml poivpropytom centrifuge tuW 
wmi its Vap removed) effectively scaling tc Thc ^^^.^ 
the PCRtuoc ana the end of the nberopue caWc were 
from room light and the room light* were kept d^eddurmg 
each run. The monitored PCR was an ampWicAuon of V-chTO- 
roosdrnMpcdfic repeat sequences as U^hcd "bovf- 
u&iriff.an anneaHngfextension cetnperauirc of 50XL The teacnon 
was cohered w«h ? mineral oil (2 drope) to pr^crtt cvaporanon 
Tb^rmocydin^ and' fJuoresccocc rowuremcut vw Wjf* 
multancously. A time-base scan, with a 10 second mtegrafiOO mnc 
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«m uted and the emtsaott signal was ratiocd to' the excitation 
jugO'tl t0 control foe change* in li^ht-iourcc intensity. Pata.wcrc 
cohcdcd using the draSOOOf, version 15 (SFEX) data system. 
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IMMUNO BIOLOGICAL LABORATORIES 



SCD-14EUSA 



Trauma, Shock and Sepsis 



The C0-14 molecule is' expressed on the surface of 
monocytes and some macrophages. Membrane- 
bound CD-14 is a receptor for lipopolysaccharide 
(LPS) comptexed to LPS-Binding-Protein (LBP). The 
ooncenirailon of Its soluble form is altered under 
certain patliologfeal conditions. There- is evidence for 
an important rote of $C0 -14. with polytrauma, sepsis, 
burnings and inflammations. 
During septic conditjons and acute infections it seems 
to be a prognostic marker -and is therefore of value in 
mentoring these patients. 



IBL offers an EU$A for quantitative determination of 

soluble CD-14 in human serum, -plasma, cell-culture 

supematants and other biological fluids. 

Assay features: 12x8 determinations 
(microliter strips), 
precoated with a specific 
monoclonal antibody, 
2x1 hour incubation, 
standard range: 3-96 ng/mi 
detection limit 1 ngfml 
CV: intra- and interassay < 8% 



. for more information call or fax 
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with Fluorescent Dyes at 
Opposite Ends Provide a Quenched Probe 
System Useful for Detecting PCR Product 
and Nucleic Acid Hybridization 

Kenneth J. Livak, Susan J.A. Flood, Jeffrey Marmaro, William Giusti, and Karin Deetz 

Perkin-EImer, Applied Biosystems Division, Foster City, California 94404 



The 5' nuclease PCR assay detects the 
accumulation of specific PCR product 
by hybridization and cleavage of a 
double-labeled fluorogenlc probe 
during the amplification reaction. 
Hie probe Is an oligonucleotide with 
both a reporter fluorescent dye and a 
quencher dye attached. An increase 
In reporter fluorescence intensity in- 
dicates that the probe has hybridized 
to the target PCR product and has 
been cleaved by the 5' — >3' nucle- 
olytlc activity of Taq DMA polymerase. 
In this study, probes with the 
quencher dye attached to an internal 
nucleotide were compared with 
probes with the quencher dye at- 
tached to the 3 '-end nucleotide. In all 
cases, the reporter dye was attached 
to the 5' end. All Intact probes 
showed quenching of the reporter 
fluorescence. In general, probes with 
the quencher dye attached to the 3'- 
end nucleotide exhibited a larger sig- 
nal In the 5' nuclease PCR assay than 
the Internally labeled probes. It Is 
proposed that the larger signal Is 
caused by Increased likelihood of 
cleavage by Taq DNA polymerase 
when the probe Is hybridized to a 
template strand during PCR. Probes 
with the quencher dye attached to 
the 3 '-end nucleotide also exhibited 
an Increase In reporter fluorescence 
Intensity when hybridized to a com- 
plementary strand. Thus, oligonucle- 
otides with reporter and quencher 
dyes attached at opposite ends can 
be used as homogeneous hybridiza- 
tion probes. 



homogeneous assay for detecting 
the accumulation of specific PCR prod- 
uct that uses a double-labeled fluoro- 
genic probe was described by Lee et al. (1) 
The assay exploits the 5'->3' nucle- 
olytic activity of Taq DNA poly- 
merase^ and Is diagramed in Figure 1. 
The fluorogenic probe consists of an oli- 
gonucleotide with a reporter fluorescent 
dye, such as a fluorescein, attached to 
the 5' end; and a quencher dye, such as a 
rhodamine, attached internally. When 
the fluorescein is excited by irradiation, 
its fluorescent emission will be 
quenched if the rhodamine is close 
enough to be excited through the pro- 
cess of fluorescence energy transfer 
(FET).< 4 - 5) During PCR, if the probe is hy- 
bridized to a template strand, Taq DNA 
polymerase will cleave the probe be- 
cause of its inherent 5'-* 3' nucleolytic 
activity. If the cleavage occurs between 
the fluorescein and rhodamine dyes, it 
causes an increase in fluorescein fluores- 
cence intensity because the fluorescein 
is no longer quenched. The increase in 
fluorescein fluorescence intensity indi- 
cates that the probe-specific PCR product 
has been generated. Thus, FET between a 
reporter dye and a quencher dye is criti- 
cal to the performance of the probe in 
the 5' nuclease PCR assay. 

Quenching is completely dependent 
on the physical proximity of the two 
dyes. (6) Because of this, it has been as- 
sumed that the quencher dye must be 
attached near the 5' end. Surprisingly, 
we have found that attaching a rho- 
damine dye at the 3' end of a probe 
still provides adequate quenching for 
the probe to perform in the 5' nuclease 



PCR assay. Furthermore, cleavage of this 
type of probe is not required to achieve 
some reduction in quenching. Oligonu- 
cleotides with a reporter dye on the 5' 
end and a quencher dye on the 3' end 
exhibit a much higher reporter fluores- 
cence when double-stranded as com- 
pared with single-stranded. This should 
make it possible to use this type of dou- 
ble-labeled probe for homogeneous de- 
tection of nucleic acid hybridization. 

MATERIALS AND METHODS 
Oligonucleotides 

Table 1 shows the nucleotide sequence 
of the oligonucleotides used in this 
study. Linker arm nucleotide (LAN) 
phosphoramidite was obtained from 
Glen Research. The standard DNA phos- 
phoramidites, 6-carboxyfIuorescein (6- 
FAM) phosphoramidite, 6-carboxytet- 
ramethylrhodamine succinimidyl ester 
(TAMRA NHS ester), and Phosphalink 
for attaching a 3 '-blocking phosphate, 
were obtained from Perkin-EImer, Ap- 
plied Biosystems Division. Oligonucle- 
otide synthesis was performed using an 
ABI model 394 DNA synthesizer (Applied 
Biosystems). Primer and complement 
oligonucleotides were purified using 
Oligo Purification Cartridges (Applied 
Biosystems). Double-labeled probes were 
synthesized with 6-FAM-labeled phos- 
phoramidite at the 5' end, LAN replacing 
one of the T's in the sequence, and Phos- 
phalink at the 3' end. Following de- 
protection and ethanol precipitation, 
TAMRA NHS ester was coupled to the 
LAN-containing oligonucleotide in 250 
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FIGURE 1 Diagram of 5' nuclease assay. Stepwise representation of the 5' - 3' 

S of 7^ DNA ^olym^se acting on a fluorogenic probe during one extension phase of PCR. 



him Na-bicarbonate buffer (pH 9.0) at 
room temperature. Unreacted dye was 
removed by passage over a PD-10 Sepha- 
dex column. Finally, the double-labeled 
probe was purified by preparative high- 
performance liquid chromatography 
(HPLC) using an Aquapore C 8 220x4.6- 
mm column with 7-pjn particle size. The 
column was developed with a 24-min 
linear gradient of 8-20% acetonitrile in 
0.1 m TEAA (triethylamine acetate). 
Probes are named by designating the se- 
quence from Table 1 and the position of 
the LAN-TAMRA moiety. For example, 
probe Al-7 has sequence Al with LAN- 
TAMRA at nucleotide position 7 from the 
5' end. 



PCR Systems 

All PCR amplifications were performed 
in the Perkin-Elmer GeneAmp PCR Sys- 
tem 9600 using 50-ul reactions that con- 
tained 10 m*i Tris-HCl (pH 8.3), 50 mw 
KC1, 200 u.m dATP, 200 u.m dCTP, 200 p.M 
dGTP, 400 pM dUTP, 0.5 unit of AmpEr- 
ase uracil N-glycosylase (Perkin-Elmer), 
and 1.25 unit of AmpHTaq DNA poly- 
merase (Perkin-Elmer). A 295-bp seg- 
ment from exon 3 of the human p-actin 



gene (nucleotides 2141-2435 in the se- 
quence of Nakajima-Iijima et al.) (7) was 
amplified using primers AFP and ARP 
(Table 1), which are modified slightly 
from those of du Breuil et al. (8) Actin am- 
plification reactions contained 4 mM 
MgCl 2 , 20 ng of human genomic DNA, 
50 nM Al or A3 probe, and 300 nM each 



TABLE 1 Sequences of Oligonucleotides 



primer. The thermal regimen was 50°C 
(2 min), 95°C (10 min), 40 cycles of 95*C 
(20 sec), 60°C (1 min), and hold at 72°C 
A 515-bp segment was amplified from a 
plasmid that consists of a segment of X 
DNA (nucleotides 32,220-32,747) in- 
serted in the Smal site of vector pUC119. 
These reactions contained 3.5 mM 
MgCl* 1 ng of plasmid DNA, 50 nM P2 or 
P5 probe, 200 nM primer F119, and 200 
nM primer R119. The thermal regimen 
was 50°C (2 min), 95°C (10 min), 25 cy- 
cles of 95°C (20 sec), 57X (1 min), and 
hold at 72°C 



Fluorescence Detection 

For each amplification reaction, a 40-ul 
aliquot of a sample was transferred to an 
individual well of a white, 96-well micro- 
titer plate (Perkin-Elmer). Fluorescence 
was measured on the Perkin-Elmer Taq- 
Man LS-50B System, which consists of a 
luminescence spectrometer with plate 
reader assembly, a 485-nrn excitation fil- 
ter, and a 515-nm emission filter. Excita- 
tion was at 488 nm using a 5-nm slit 
width. Emission was measured at 518 
nm for 6-FAM (the reporter or R value) 
and 582 nm for TAMRA (the quencher or 
Q value) using a 10-nm slit width. To 
determine the increase in reporter emis- 
sion that is caused by cleavage of the 
probe during PCR, three normalizations 
are applied to the raw emission data. 
First, emission intensity of a buffer blank 
Is subtracted for each wavelength. Sec- 
ond, emission intensity of the reporter is 



Name 



Type 



Sequence 



ACCCACAGGAACTGATCACCACTC 

ATGTCGCGTTCCGGCTGACGTTCTGC 

TCGCATTACI GATCGTl'GCCAACCAGTp 

CTACTGGTTGGCAACGATCAGTAATGCGATG 

CGGA'lTTGCTGGTATCTATGACAAGGATp 

TTCATCCTTGTCATAGATACCAGCAAATCCG 

TCACCCACACTGTGCCCATCTACGA 

CAGCGGAACCGCrCATTGCCAATGG 

ATCCCCTCCCCCATGCCATCCTGCGTp 

AGACGCAGGATGGCATGGGGGAGGGCATAC 

CGCCCrGGACTTCGAGCAAGAGATp 
CCATCTCTTGCTCGAAGTCCAGGGCGAC^ 



tuted for a T. (p) The presence of a 3' phosphate on each probe. 



F119 
R119 
P2 
P2C 
PS 
P5C 
AFP 
ARP 
Al 
A1C 
A3 
A3C 



primer 
primer 
probe 

complement 
probe 

complement 
primer 
primer 
probe 

complement 
probe 

complement 
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A 1 -2 RAQGCCCTCCCCCATGCCATCCTGCGTp 

A1 -7 RATGCCCQCCCCCATGCCATCCTGCGOp 

A1 -1 4 RATGCCCTCCCCCAQGCCATCCTGCGTp 

A1 -1 9 RATGCCCTCCCCCATGCCAQCCTGCGTp 

A 1-22 RATGCCCTCCCCCATGCCATCCQGCGTp 

A1 -26 RATGCCCTCCCCCATGOCATCCTGCGQp 



Probe 


518 nm 


582 nm 


RQ- 


RQ+ 


ARQ 




no temp. 


+ temp. 


no temp. 


+ temp. 








A 1-2 


25.5 ±2.1 


32.711,9 


38.2 + 3.0 


38.212.0 


0.67 ±0.01 


0.86 ±0.06 


0.1910.06 


A1-7 


53.5+4.3 


395.1 ±21.4 


108.516.3 


110.3+5.3 


0.49 1 0.03 


3.5810.17 


3.09 ±0.18 


A1-14 


127.0 ±4,9 


403.5 + 19.1 


109.7 ±5.3 


93.1 ±6.3 


1.1610.02 


4.3410.15 


3.1810.15 


A1-19 


187.5 ± 17.9 


422.7 ±7.7 


70.3 ±7.4 


73.0 ±2.8 


2.6710.05 


5.80 ±0.15 


3.13 ±0.16 


A1-22 


224.6 ±9.4 


482.2 ± 43.6 


100.014.0 


96.2 + 9.6 


2.2510.03 


5.0210.11 


2.7710.12 


A1-26 


160.2 ±8.9 


454.1118.4 


93.1 ± 5.4 


90.713.2 


1.7210.02 


5.01+0.08 


3.2910.08 



FIGURE 2 Results of 5' nuclease assay comparing 0-actin probes with TAMRA at different nucle- 
otide positions. As described in Materials and Methods, PCR amplifications containing the in- 
dicated piobes were performed, and the fluorescence emission was measured at 518 and 582 nm. 
Reported values are the average ±1 s.D. for six reactions run without added template (no temp.) 
and six reactions run with template (+temp.). The RQ ratio was calculated for each individual 
reaction and averaged to give the reported RQ" and RQ + values. 



divided by the emission intensity of the 
quencher to give an RQ ratio for each 
reaction tube. This normalizes for well- 
to-weil variations in probe concentra- 
tion and fluorescence measurement. Fi- 
nally, ARQ is calculated by subtracting 
the RQ value of the no-template control 
(RQ") from the RQ value for the com- 
plete reaction including template 
(RQ + ). 

RESULTS 

A series of probes with increasing dis- 
tances between the fluorescein reporter 
and rhodamine quencher were tested to 
investigate the minimum and maximum 
spacing that would give an acceptable 
performance in the 5' nuclease PCR as- 
say. These probes hybridize to a target 



sequence in the human p-actin gene. 
Figure 2 shows the results of an experi- 
ment in which these probes were in- 
cluded in PCR that amplified a segment 
of the ^actin gene containing the target 
sequence. Performance in the 5' nu- 
clease PCR assay is monitored by the 
magnitude of ARQ, which is a measure 
of the increase in reporter fluorescence 
caused by PCR amplification of the 
probe target. Probe Al-2 has a ARQ value 
that is close to zero, indicating that the 
probe was not cleaved appreciably dur- 
ing the amplification reaction. This sug- 
gests that with the quencher dye on the 
second nucleotide from the 5' end, there 
is insufficient room for Taq polymerase 
to cleave efficiently between the reporter 
and quencher. The other five probes ex- 
hibited comparable ARQ values that are 



clearly different from zero. Thus, all five 
probes are being cleaved during PCR am- 
plification resulting in a similar increase 
in reporter fluorescence. It should be 
noted that complete digestion of a probe 
produces a much larger increase in re- 
porter fluorescence than that observed 
in Figure 2 (data not shown). Thus, even 
in reactions where amplification occurs, 
the majority of probe molecules remain 
uncleaved. It is mainly for this reason 
that the fluorescence intensity of the 
quencher dye TAMRA changes little with 
amplification of the target. This is what 
allows us to use the 582-nm fluorescence 
reading as a normalization factor. 

The magnitude of RQ" depends 
mainly on the quenching efficiency in- 
herent in the specific structure of the 
probe and the purity of the oligonucle- 
otide. Thus, the larger RQ~ values indi- 
cate that probes AM4, Al-19, Al-22, and 
A 1-26 probably have reduced quenching 
as compared with Al-7. Still, the degree 
of quenching is sufficient to detect a 
highly significant increase in reporter 
fluorescence when each of these probes 
is cleaved during PCR. 

To further investigate the ability of 
TAMRA on the 3' end to quench 6-FAM 
on the 5' end, three additional pairs of 
probes were tested in the 5' nuclease 
PCR assay. For each pair, one probe has 
TAMRA attached to an internal nucle- 
otide and the other has TAMRA attached 
to the 3' end nucleotide. The results are 
shown in Table 2. For all three sets, the 
probe with the 3' quencher exhibits a 
ARQ value that is considerably higher 
than for the probe with the internal 
quencher. The RQ" values suggest that 
differences in quenching are not as great 
as those observed with some of the Al 
probes. These results demonstrate that a 
quencher dye on the 3' end of an oligo- 
nucleotide can quench efficiently the 



TABLE 2 Results of 5' Nuclease Assay Comparing Probes with TAMRA Attached to an Internal or 3'-terminal Nucleotide 



518 nm 



582 nm 



Probe 


no temp. 


+ temp. 


no temp. 


+ temp. 


RQ" 


RQ + 


ARQ 


A3-* 


54.6 ± 3.2 


84.8 ± 3.7 


116.2 ± 6.4 


115.6:* 2.5 


0.47 ± 0.02 


0.73 ± 0.03 


0.26 ± 0.04 


A3-24 


. 72.1 ± 2.9 


236.5 ± 11.1 


84.2 ± 4.0 


90.2 ± 3.8 


0.86 ± 0.02 


2.62 ± 0.05 


1.76 ± 0.05 


P2-7 


82.8 ± 4.4 


384.0 ± 34.1 


105.1 ± 6.4 


120.4 ± 10.2 


0.79 ± 0.02 


3.19 ±0.16 


2.40 ±0.16 


P2-27 


113.4 ± 6.6 


555.4 ± 14.1 


140.7 ± 8.5 


118.7 ± 4.8 


0.81 ± 0.01 


4.68 ±0.10 


3.68 ± 0.10 


P5-10 


77.5 ± 6.5 


244.4 ± 15.9 


86.7 ± 4.3 


95.8 ± 6.7 


0.89 ± 0.05 


2.55 ± 0.06 


1.66 ± 0.08 


P5-28 


64.0 ± 5.2 


333.6 ±12.1 


100,6 * 6.1 


94.7 ± 6.3 


0.63 ± 0.02 


3.53 ± 0.12 


2.89 ± 0.13 



Reactions containing the indicated probes and calculations were performed as described in Material and Methods and in the legend to Fig. 2. 
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fluorescence of a reporter dye on the 5' 
end. The degree of quenching is suffi- 
cient for this type of oligonucleotide to 
be used as a probe in the 5' nuclease PCR 
assay. 

To test the hypothesis that quenching 
by a 3' TAMRA depends on the flexibility 
of the oligonucleotide, fluorescence was 
measured for probes in the single- 
stranded and double-stranded states. Ta- 
ble 3 reports the fluorescence observed 
at 518 and 582 nm. The relative degree 
of quenching is assessed by calculating 
the RQ ratio. For probes with TAMRA 
6-10 nucleotides from the 5' end, there 
is little difference in the RQ values when 
comparing single-stranded with double- 
stranded oligonucleotides. The results 
for probes with TAMRA at the 3' end are 
much different. For these probes, hy- 
bridization to a complementary strand 
causes a dramatic increase in RQ. We 
propose that this loss of quenching is 
caused by the rigid structure of double- 
stranded DNA, which prevents the 5' 
and 3' ends from being in proximity. 

When TAMRA is placed toward the 3' 
end, there is a marked Mg 2 "*" effect on 
quenching. Figure 3 shows a plot of ob- 
served RQ values for the Al series of 
probes as a function of Mg 2 "*" concentra- 
tion. With TAMRA attached near the 5' 
end (probe Al-2 or Al-7), the RQ value at 
OmMMg 2 * is only slightly higher than 
RQ at 10 mM Mg 2 *. For probes Al-19, 
Al-22, and Al-26, the RQ values at 0 mM 
Mg 2 * are very high, indicating a much 



reduced quenching efficiency. For each 
of these probes, there is a marked de- 
crease in RQ at 1 mM Mg 2+ followed by 
a gradual decline as the Mg 2 ** concen- 
tration increases to 10 mM. Probe Al-14 
shows an intermediate RQ value at 0 mM 
Mg 2 * with a gradual decline at higher 
Mg 2 * concentrations. In a low-salt en- 
vironment with no Mg 2+ present, a sin- 
gle-stranded oligonucleotide would be 
expected to adopt an extended confor- 
mation because of electrostatic repul- 
sion. The binding of Mg 2 * ions acts to 
shield the negative charge of the phos- 
phate backbone so that the oligonucle- 
otide can adopt conformations where 
the 3' end is close to the S' end. There- 
fore, the observed Mg 2 * effects support 
the notion that quenching of a 5' re- 
porter dye by TAMRA at or near the 3' 
end depends on the flexibility of the oli- 
gonucleotide. 

DISCUSSION 

The striking finding of this study is that 
it seems the rhodamine dye TAMRA, 
placed at any position in an oligonucle- 
otide, can quench the fluorescent emis- 
sion of a fluorescein (6-FAM) placed at 
the 5' end. This implies that a single- 
stranded, double-labeled oligonucle- 
otide must be able to adopt conforma- 
tions where the TAMRA is close to the 5' 
end. It should be noted that the decay of 
6-FAM in the excited state requires a cer- 
tain amount of time. Therefore, what 



TABLE 3 Comparison of Fluorescence Emissions of Single-stranded and 
Double-stranded Fluorogenic Probes 



518 nm 



582 nm 



RQ 



Probe 



ss 



ds 



ss 



ds 



ss 



ds 



Al-7 

Al-26 

A3-6 

A3-24 

P2-7 

P2-27 

P5-10 

P5-28 



27.75 
43.31 
16.75 
30.05 
35.02 
39.89 
27.34 
33.65 



68.53 
509.38 

62.88 
578.64 

70.13 
320.47 
144.85 
462.29 



61.08 
53.50 
39.33 
67.72 
54-63 
65.10 
61.95 
72.39 



138.18 
93.86 
165.57 
140.25 
121.09 
61.13 
165.54 
104.61 



0.45 
0.81 
0.43 
0.45 
0.64 
0.61 
0.44 
0.46 



0.50 
5.43 
0.38 
3.21 
0.58 
5.25 
0.87 
4.43 



(ss) Single-stranded. The fluorescence emissions at 518 or 582 nm for solutions containing a final 
concentration of SO nw indicated probe, 10 dim Tris-HCl (pH 8.3), 50 mM KG, and 10 mM MgCl 2 . 
(ds) Double-stranded. The solutions contained, in addition, 100 nM A1C for probes Al-7 and 
Al-26, 100 nM A3C for probes A3-6 and A3-24, 100 nw P2C for probes P2.7 and P2-27, or 100 DM 
P5C for probes P5-10 and P5-28. Before the addition of MgCI 2 , 120 ul of each sample was heated 
at 95°C foi 5 min. Following the addition of 80 jil of 25 mM MgCl 2 , each sample was allowed to 
cool to room temperature and the fluorescence emissions were measured. Reported values are 
the average of three determinations. 



matters for quenching is not the average 
distance between 6-FAM and TAMRA 
but, rather, how close TAMRA can get to 
6-FAM during the lifetime of the 6-FAM 
excited state. As long as the decay time of 
the excited state is relatively long com- 
pared with the molecular motions of the 
oligonucleotide, quenching can occur. 
Thus, we propose that TAMRA at the 3' 
end, or any other position, can quench 
6-FAM at the 5' end because TAMRA is in 
proximity to 6-FAM often enough to be 
able to accept energy transfer from an 
excited 6-FAM. 

Details of the fluorescence measure- 
ments remain puzzling. For example, Ta- 
ble 3 shows that hybridization of probes 
Al-26, A3-24, and P5-28 to their comple- 
mentary strands not only causes a large 
increase in 6-FAM fluorescence at 518 
nm but also causes a modest increase in 
TAMRA fluorescence at 582 nm. If 
TAMRA is being excited by energy trans- 
fer from quenched 6-FAM, then loss of 
quenching attributable to hybridization 
should cause a decrease in the fluores- 
cence emission of TAMRA. The fact that 
the fluorescence emission of TAMRA in- 
creases indicates that the situation is 
more complex. For example, we have an- 
ecdotal evidence that the bases of the 
oligonucleotide, especially G, quench 
the fluorescence of both 6-FAM and 
TAMRA to some degree. When double- 
stranded, base-pairing may reduce the 
ability of the bases to quench. The pri- 
mary factor causing the quenching of 
6-FAM in an intact probe is the TAMRA 
dye. Evidence for the importance of 
TAMRA is that 6-FAM fluorescence 
remains relatively unchanged when 
probes labeled only with 6-FAM are used 
in the 5' nuclease PCR assay (data not 
shown). Secondary effectors of fluores- 
cence, both before and after cleavage of 
the probe, need to be explored further. 

Regardless of the physical mecha- 
nism, the relative independence of posi- 
tion and quenching greatly simplifies 
the design of probes for the 5' nuclease 
PCR assay. There are three main factors 
that determine the performance of a 
double-labeled fluorescent probe in the 
5' nuclease PCR assay. The first factor is 
the degree of quenching observed in the 
intact probe. This is characterized by the 
value of RQ", which is the ratio of re- 
porter to quencher fluorescent emis- 
sions for a no template control PCR. In- 
fluences on the value of RQ" include 
the particular reporter and quencher 
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mM Mg 

FIGURE 3 Effect of Mg 2+ concentration on RQ ratio for the Al series of probes. The fluorescence 
emission intensity at 518 and 582 nm was measured for solutions containing 50 nM probe, 10 mM 
Tris-HCI <pH 8.3), 50 mM KCi, and varying amounts {O-10 mM) of MgCI 2 . The calculated RQ 
ratios (518 nm intensity divided by 582 nm intensity) are plotted vs. MgCl 2 concentration (mM 
Mg). The key (upper right) shows the probes examined. 



dyes used, spacing between reporter and 
quencher dyes, nucleotide sequence 
context effects, presence of structure or 
other factors that reduce flexibility of 
the oligonucleotide, and purity of the 
probe. The second factor is the efficiency 
of hybridization, which depends on 
probe T ml presence of secondary struc- 
ture in probe or template, annealing 
temperature, and other reaction condi- 
tions. The third factor is the efficiency at 
which Taq DNA polymerase cleaves the 
bound probe between the reporter and 
quencher dyes. This cleavage is depen- 
dent on sequence complementarity be- 
tween probe and template as shown by 
the observation that mismatches in the 
segment between reporter arid quencher 
dyes drastically reduce the cleavage of 
probe. <l) 

The rise in RQ" values for the Al se- 
ries of probes seems to indicate that the 
degree of quenching is reduced some- 
what as the quencher is placed toward 
the 3' end. The lowest apparent quench- 
ing is observed for probe Al-19 (see Fig. 
3) rather than for the probe where the 
TAMRA is at the 3' end (Al-26). This is 
understandable, as the conformation of 
the 3' end position would be expected to 
be less restricted than the conformation 
of an internal position. In effect, a 
quencher at the 3' end is freer to adopt 
conformations close to the 5' reporter 
dye than is an internally placed 
quencher. For the other three sets of 



probes, the interpretation of RQ~ values 
is less clear-cut. The A3 probes show the 
same trend as Al, with the 3' TAMRA 
probe having a larger RQ*" than the in- 
ternal TAMRA probe. For the P2 pair, 
both probes have about the same RQ~ 
value. For the P5 probes, the RQ" for the 
3' probe is less than for the internally 
labeled probe. Another factor that may 
explain some of the observed variation is 
that purity affects the RQ" value. Al- 
though all probes are HPLC purified, a 
small amount of contamination with 
unquenched reporter can have a large ef- 
fect on RQ". 

Although there may be a modest ef- 
fect on degree of quenching, the posi- 
tion of the quencher apparently can 
have a large effect on the efficiency of 
probe cleavage. The most drastic effect is 
observed with probe Al-2, where place- 
ment of the TAMRA on the second nu- 
cleotide reduces the efficiency of cleav- 
age to almost zero. For the A3, P2, and PS 
probes, ARQ is much greater for the 3' 
TAMRA probes as compared with the in- 
ternal TAMRA probes. This is explained 
most easily by assuming that probes 
with TAMRA at the 3' end are more likely 
to be cleaved between reporter and 
quencher than are probes with TAMRA 
attached internally. For the Al probes, 
the cleavage efficiency of probe Al-7 
must already be quite high, as ARQ does 
not increase when the quencher is 
placed closer to the 3' end. This illus- 



trates the importance of being able to 
use probes with a quencher on the 3' 
end in the 5' nuclease PCR assay. In this 
assay, an increase in the intensity of re- 
porter fluorescence is observed only 
when the probe is cleaved between the 
reporter and quencher dyes. By placing 
the reporter and quencher dyes on the 
opposite ends of an oligonucleotide 
probe, any cleavage that occurs will be 
detected. When the quencher is attached 
to an internal nucleotide, sometimes the 
probe works well (Al-7) and other times 
not so well (A3-6). The relatively poor 
performance of probe A3-6 presumably 
means the probe is being cleaved 3' to 
the quencher rather than between the 
reporter and quencher. Therefore, the 
best chance of having a probe that reli- 
ably detects accumulation of PCR prod- 
uct in the 5' nuclease PCR assay is to use 
a probe with the reporter and quencher 
dyes on opposite ends. 

Placing the quencher dye on the 3 r 
end may also provide a slight benefit in 
terms of hybridization efficiency. The 
presence of a quencher attached to an 
internal nucleotide might be expected to 
disrupt base-pairing and reduce the T m 
of a probe. In fact, a 2°C-3°C reduction 
in T m has been observed tot two probes 
with internally attached TAMRAs. <9) This 
disruptive effect would be minimized by 
placing the quencher at the 3* end. Thus, 
probes with 3' quenchers might exhibit 
slightly higher hybridization efficiencies 
than probes with internal quenchers. 

The combination of increased cleav- 
age and hybridization efficiencies means 
that probes with 3' quenchers probably 
will be more tolerant of mismatches be- 
tween probe and target as compared 
with internally labeled probes. This, tol- 
erance of mismatches can be advanta- 
geous, as when trying to use a single 
probe to detect PCR-amplified products 
from samples of different species. Also, it 
means that cleavage of probe during PCR 
is less sensitive to alterations in an- 
nealing temperature or other reaction 
conditions. The one application where 
tolerance of mismatches may be a disad- 
vantage is for allelic discrimination. Lee 
et al. (l> demonstrated that allele-specific 
probes were cleaved between reporter 
and quencher only when hybridized to a 
perfectly complementary target. This al- 
lowed them to distinguish the normal 
human cystic fibrosis allele from the 
AF508 mutant. Their probes had TAMRA 
attached to the seventh nucleotide from 
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the 5' end and were designed so that any 
mismatches were between the reporter 
and quencher. Increasing the distance 
between reporter and quencher would 
lessen the disruptive effect of mis- 
matches and allow cleavage of the probe 
on the incorrect target. Thus, probes 
with a quencher attached to an internal 
nucleotide may still be useful for allelic 
discrimination. 

In this study loss of quenching upon 
hybridization was used to show that 
quenching by a 3' TAMRA is dependent 
on the flexibility of a single-stranded oli- 
gonucleotide. The increase in reporter 
fluorescence intensity, though, could 
also be used to determine whether hy- 
bridization has occurred or not. Thus, 
oligonucleotides with reporter and 
quencher dyes attached at opposite ends 
should also be useful as hybridization 
probes. The ability to detect hybridiza- 
tion in real time means that these probes 
could be used to measure hybridization 
kinetics. Also, this type of probe could be 
used to develop homogeneous hybrid- 
ization assays for diagnostics or other ap- 
plications. Bagwell et al. <10) describe just 
this type of homogeneous assay where 
hybridization of a probe causes an in- 
crease in fluorescence caused by a loss of 
quenching. However, they utilized a 
complex probe design that requires add- 
ing nucleotides to both ends of the 
probe sequence to form two imperfect 
hairpins. The results presented here 
demonstrate that the simple addition of 
a reporter dye to one end of an oligonu- 
cleotide and a quencher dye to the other 
end generates a fluorogenic probe that 
can detect hybridization or PCR amplifi- 
cation. 
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Wc have developed a novel "real time" quantitative PCR method. The method measure* PCR. product 
accumulation through a duaUabded niiotogenle probe (Lc., TaqMan Ptob*). This method provides : vwy 
accurate and reproducible quantitation of gene copies. Unlike otter quantitative PCR methods, real-time PCR 
does nor require post-PCR sample handling, preventing potential PCR product carry-over contamination and 
resulting in much faster and higher throughput assays. The real-time PCR method has a very large dynamic 
ramje of starting target molecule determination (at least five orders of magnitude). Real-time quantitative 
PCR is extremely accurate and less labor-intensive than airrenc quantitative PCR methods. 



Quantitative nucleic acid sequence analysis has 
had an important role in many fields of biologi- 
cal research. Measurement of gent expression 
(RNA) has been used extensively in monitoring 
biological responses to various stimuli (Tan el at. 
1994; Huang et al. I995a,b; Prud'hommc el al. 
1995). Quantitative gene analysis (DNA) has 
Ix-cn used to d«i<=rminc the genome quantity of a 
particular gene, as in the case or ttie human UER2 
gene, which Is amplified in -30% of breast tu- 
mors (Slarnon et al. 1987). Gene and genome 
quantitation (DNA and UNA) also have been used 
for analysis of human inununodcficicncy virus 
(IIJV) buTden demonstrating changes in the lev- 
els of virus throughout the different phases of the 
disease (Connor et al. 1993; Platak ct al. jvvwn; 
Purtado et al. 199S>- 

Many methods have heen described for the 
quantitative analysis or nucleic acid sequences 
(both for RNA and DNA; Southern 19/6; Sharp ct 
al. 19K0; Thomas 1980). Recently, PCR has 
proven to be a powerful tool for quantitative 
nucleic acid analysis. PCR and reverse transcrip- 
tase (RTJ-PCR have permitted Ihc analysis of 
minimal starting quantities of nucleic acid (as 
little as one cell equivalent). This has made pos- 
slhle many experiments that could nol hove heen 
performed with traditional methods. Although 
PCR has provided a powerful tool, it is imperative 



that \\ he u:>ed properly for quantitation (U»«y* 
maeKers 1995). Many early rcporls of quantita- 
tive PCR ami RT-PCR described quantitation of 
the PCR product hut did not measure the initial 
target sequence, quantity. It is essentia] to design 
proper controls for the quantitation of the initial 
target sequences (Pcrrc 1992; Clement! ct al. 
100?.) 

Kv'NWirchers have, developed several methods 
of quantitative PCR and RT-PCR. One approach 
measures PCR product quantity in the log phase 
of the reunion before the plateau (Kellogg et aL 
1990; Pang et »J. 1990). This method requires 
thai each sample has equal input amounts of 
nucleic add and that each sample under analysis 
amplifies with idcnt leal efficiency up to the. point 
of quantitative analysis. A gene sequence (con- 
tained in all samples at relatively constant quan- 
tities, such as p-aelln) can be used for sample. 
uoipliTication efficiency normalization. Using 
conventional methods of PCR detection and 
quantitation (gci electrophoresis or plate capture 
hybridization), it is exi remcly laborious to assure 
that all samples are analyzed during the log phase 
of the reaction (for both the target gene and the 
normalization gene). Another method, quantita- 
tive competitive (QQ-PCR, has heen developed 
and is used widely for PCR quantitation. QC-PCR 
n:lies on the inclusion of an internal control 
. compct Uor in each reaction (Becker-Andre 1991 ; 
Platak ct al. 1993*,1>). Thv efficiency of cacb re- 
action is normalh^d to the intcrnol compel Hot. 
a Lnnwn aim tun 1 of interna J competitor can be 
armru 7nc« no/ «&« vvj Rc:frT 7fin7/cn/7T 
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added to oach sample. To obtain relative nunni* 
tat Ion, the unknown target PCtt product is com- 
pared with the known competitor IflU product. 
Success of a quant 11 alive competitive PCU assay 
relics on developing an tux mini control Oml am- 
plifier with the same efficiency as the Uugcl nioh 
cculc. The design of the competitor and the vali- 
dation of amplification efficiencies icquirc a 
<lcdicatcd effort. However, because Q(MO does 
not require that PC1R pu*Jucts be aiialyxed during 
the log phase of (lie amplification, it is tin: easier 
uf the two methods to use. 

Several detection systems uiv u*ed for quan 
Utative PCX and RT-l»c!tt analysis; <J) agumse 
gchi, (2) fluorescein labcJl ng of PCU products and 
detection with loaer-induced fluorescence using 
capillary electrophoresis (h'usco et al. 1995; Wil- 
liams ct al. 1996) or acryiaiulde gcto, and (3) pl«ie 
capture; and sandwich probe hybridlz-aliou (Mul- 
der et ah 1994). Although these IJICUukJn proved 

successful, each method requires posl-PCR ma- 
nipulations that add time to the analysis and 
ill«y lead to htbui uUny i on In mi nation. The 
sample Lino ugh put of these jnrthud:* i> 11 in I ted 
(with llic i-xc.cpilon of the plate capture ap- 
proach)* and, lh«ri:foru, these methods, ore not 
well >uited fwj u>o demanding high sample 
Throughput (I.e., .screening of large numbers of 

hloxitwlt^cultr:* ui ai uity/.lii^ A^mplva fur Uiagil^* 
tic* or clinical trials). 

Here we report the development of a novel 
uAxay for quantitative DNA analysis. The assay is 
halted on Die use: of the 5' nuclease assay first 
described by Holland et ah (1993). The method 
u<scs ih<- .V nuclease activity of 7Vi</ pt Ay in erase lo 
cleave a nonoetcndlblc hybridization probe dur- 
ing the extension phase of PCU- The approach 
uses dual-labeled fluorogenic hybridisation 
probes (Lcc et ni. 1993; Holler et al. 1995; Mvok 
el «1, 1996a,b). One fluorescent dye .-serves as a 
reporter |PAM (i.e., G-carboxyOuorecKein)! and it* 
emission spectra is quenched by the second fluo- 
rescent dye, TAKiRA (I.e.., o-carl>oxy-tetramcthyl- 
rhodaminc). The nuclease degradation of the hy- 
hrldlsuittoii probe releases the quenching of Ihe 
I'AM fluorescent emission, resulting in an In- 
crease hi pta.k fluorescent emission at SJg nm. 
The use Ot a sequence detector (Aui Prism) allows 
measurement of fluorescent spectra of ail 96 wells 
of rhe thermal cycler continuously during the 
rc;K amplification. Therefore, the rcucliou* aje 
uiontiored in real lime. The output data is de- 
scribed and quantitative analysis of input target 
I )NA sequences ts discussed below. 



RESULTS 



PCR Product Dercctlon in R«al Time 

'H*e gonl was to develop a high-throughput, sen- 
sitive, and accural c gene quant bat Ion assay for 
use In monitoring lipid mediated therapeutic 
gene delivery. A plasm id encoding human factor 
VIII gene sequence, pI-'oTM (sec Methods), was 
used as a model therapeutic g<s"<- '^he assay uses 
fluorescent Taqman methodology and an instru- 
ment capable of measuring fluorescence in real 
lime (Alii Prism 7700 Sequence Detector). Th« 
Taqnum reaction requires a hybridization jwnhc 
1al>clcd with two different fluorescent dyes. One 
dye Is a reporter dy« (1*AM), the other is ^ quench- 
ing dye (TAMRA). When the pruln: 1* in lad, fluo- 
i esc en I energy transfer occurs and the reporter 
dye fluorescent emission is absorbed by the 
quenching dye (TAMRA). During the extension 
phase of the PCK cycle, the fluorescent hybrid- 
l/willoiv prone Is cleaved by the 5'-.'*' nuclcolytic 
activity of the T>NA polymerase. On cleavage of 
the probe, the reporter dye emission is no longer 
transferred efficiently to the quenching dye, re 
iii] til >k h* on increase of the reporter dye fluores- 
cent einuulon *»j>*Ctra. PCR primers mid probc« 
were designed foi the human factor VI 1J se- 
quence and human p-actln gene (a* described in 
Methods). Optimization reactions were per- 
formed to choose the approprlutc probe und 
magnesium concentrations yielding the hij;lu-sl 
Intensity of reporter fluorescent signal without 
sacrificing specificity. The Inst ru man I ur.es a 
charge-coupled device CCD camera) for 

measuring the fluorescent emission speelru from 
fiOO to dSO mti, Kach PC:it tube was monitored 
sequentially for 2 5 rnscc with continuous jnonl- 
toring through out the amplificutitm. liach lube 
wan re-cxandned every ft.S sec. Conmutcr soff- 
ware was designed to examijic the fluorescent Iji- 
tensity of both the reporter dye (PAM).and 
the quenching dye CPAMllA). *J 4 >»e lluoresccrtt 
intensity of tJio. quenching dye, TAMUA, changes 
very Utile over the course u( the PCR ampllfl* 
cation (data not sIiowji). "I'hcrefoixi, the Intensity 
of TAM11A dye omission serves *x an iniemal 
.ttaiidiird with which to normwllyx; the reporter 
\\yv. (FAM) emission variations. The software cal- 
culated * value termed AKn (or AftQ) usln^ the 
following equation: Aftn - (lln J ) (n-n"). where 
Un 4 . emi««lon iulcjisity t>t reporter/emission in- 
tensity of quencher at any given l hue In ft rcae 
doit tube, aiiu Rti -emission intensitity of re- 
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poncr/CmiS5ian Jlitemily uf quencher measured 
prior lo I'CK iimplilicalion in that same reaction 
tube. I'or the purpose of quantitation, the W 
three data points (ARns) collected during the. ex* 
tension step for each K:U cycle were analyzed* 
The micleolytic degradation of the ityundiy-dion. 
probe occurs <luring the extension phase or I't Ai t 
and, therefore, reporter fluorescent aius&iun in- 
creases during this time. Hie iluce data points 
were averaged for cacJi Klk cycle and The menu 
value for each was plotted in an "amplification 
plot" shown in J'ljjurc 3 A. The AKn mean value is 
plotted on the )*axis, and time, represented by 
cycle number, is plotted on thc*-axis. During the 
early cycles of the PCR amplification, the ARn 



value remains at base line When stjfficlenr 'hy- 
brid) /at Um probe has been cleaved by the Tut} 
polymerase nuclease activity; the iulensity of re- 
porter fluorcAccnt emission lncreutiet>. MoaI PCU 
amplifJuiUons reach u plateau pho«e of reporter 
fluorescent cmifision if the reaction Is carried out 
lo high cycle uui»1h:in. The amplification plot h 
examined vmly in th* reaction, at a point I hat 
icprcscnts i\w log phase of pruducl arrmnula* 
tkm. This is done by uSfriffnlng an aibitjary 
ihrcshokl thai Is based on the variability of the 
base-line duia- in Figure 1 A, the threshold whs set 
at 10 standard deviations above the mean of 
base ilnd omission t:alculated from irydea 1 to 1 5. 
Once the threshold is chosen, the point at which 
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Figure 1 



. .^^.^ - PCR product detection in real time (A) The Model 7700 software will construct amplification iploti 
from the extension phase fluorescent emission data collected during the PCR amplification. The standard de- 
viation is determined from the data points collected from the. base line of the amplification ploL values are 
calculated by determining the poim at which the fluorescence exceeds a threshold limit (usually 10 times the 
standard deviation of the base line). (B) Overlay ot amplification plots of serially (1:2) diluted human 9 e " omi fj 
DNA samples'* amplified with fl-actin primers. (Q Input DNA concentration of the samples plotted versus C T . All 
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the amplification plot crottcfl the thr cdhold'is cif 
fined as C r . C r is reported u<t the cycle number M 
tlii?: point. Ac will be demons! rut «d> tin* CI, .value 
jjicdit_tSve of the quantity of input torge.l. 

Cj. Values Provide a Quantitative Measurement* of 
input Target Sequences 

Figure IB shows amplification plots of ]£» 'differ- 
ent PCR amplifications overlaid. The amplify* 
tions were performed on a 1:2 serial dilution «utf 
human genomic IWA. M*hc amplified target 
human p »ctin, The amplification plotK xhift to 
the right (to higher threshold cycles) ns the input 
target quantity is reduced, 'Jim is expected ho* 
eaujeu nmctforiK with fuwe.r starting copios (if the 
target molecule require greater amplification to 
degrade enough probe to attain the Threshold 
fluorescence. An arbitrary threshold of 10 stan- 
dard deviations above the base line was used to 
determine the O r valuer Figure 1C represents the 
values plotted versus the sample dilution 
value, Each dilution was amplified in triplicate 
PC:r amplifications and plotted as mean values 
with error bars representing one standard devia- 
tion. The Op v«)uc$ decrease linearly with Increas- 
ing target quantity, Thus, C r valuta van be used 
as a quantitative measurement of the input target 
number. It should be noted that the amplifica- 
tion plot for the 15.6* iig sample shown In Figure 
1H does not reflect the same fluorescent rate of 
Increase exhibited by most of the other samples. 
'Hie 15.6-ng sample also achieves endpolnl pla- 
teau at a lower fluorescent value than would he 
expected based on the input PNA, This phcuoin- 
cnon has been observed occasionally with other 
samples (data not shown) and may be attribut- 
able to lata cycle inhibition; this hypothesis is 
still under investigation. It is important to note 
that the flattened slope and early plateau do not 
impact significantly the calculated C, value us 
demonstrated by the fit on the line shown jri 
Figure 1 C. All triplicate amplifications resulted in 
very similar Cr values— the standard deviation 
did not exceed 0.5 for any dilution. This experi- 
ment contains a > 1 00,000-fold range of Input tar- 
get molecules. Using C v values for quantitation 
permits a much larger assay range than directly 
using total fluorescent emission intensity for 
quantitation. The linear range. ol iluorcsccnl in- 
tensity measurement of ihe ABI 1'rlsm 7700 &c- 
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merits over n very large r;mp/» of r.Malivo ctarflnp/ 
target quantities. 

Sample Preparation Validation 

Several parameters influence the (-fiiclenry nf 
PC'.R amplification: magnesium and salt conceit* 
nation s, reaction conditions <i.e„ time and tem- 
perature), PCU target si7.c and composition, 
primer sequences, and sample purity. All of The 
.above (actors are common to a single J'CR assay, 
except sample to sample purity, in an effort to 
validate the method of sample preparation for 
the factor VJI1 assay, i'CK amplication reproduce 
ihility and efficiency ol SO replicate sample 
prefiarations were examined. After genomic ONA 
was prepared from 1he 10 replicate samples, the 
DNA was quaniUatcd by ultraviolet spectroscopy, 
Amplifications were performed analyzing p-aciln 
gene content in 100 and 25 ng of total genomic 
DNA. Each I'CK amplification was performed in 
triplicate. Comparison of C r values for each trip- 
licate sample show minimal variation based on 
standard deviation and coefficient of variance 
(Table 1). Therefore, each ol the triplicate VCM 
amplifications was highly reproducible, demon- 
strating that real time. PCH using this instrumen- 
tation introduces minimal variation into the 
quantitative J'CR analysts. Comparison of the 
mean C a values of the. 10 replicate sample prepa- 
rations also showed minimal variability, indicat- 
ing that each sample preparation yielded similar 
results for p-nctln gene quantity. The highest C r 
difference between any of rhe samples was 0.SS 
aiul 0.73 for the 100 and 25 ng samples, respec- 
tively. Additionally, the amplification of each 
sample exhibited an equivalent rate of fluorcv 
cent emission intensity change per amount of 
)>NA target analyzed ns indic&icd by similar 
slopes derived from the sample dilui Ions (Fig. 2). 
Any sample containing an excess of a PCU -inhibi- 
tor would exhibit a greater measured (3-actln G r 
value for a given quantity of DNA. In addition, 
the inhibitor would be diluted along with tin; 
sample in the dilution analysis (Fig, Z), altering 
the expected C r value change. Each sample an> 
pUfication yielded a similar result in the analysis, 
demonstrating that this method of .sample prepa- 
ration is highly reproducible, with regard to 
sample purity. 

Quantitative Analvsis of a Plasm id After 

7nc« new aha yvj «c:trT rnn7/cn/7T 
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Toblo 1 . Reproducibility of $«ropl«r Preparation Method 



Samplo 



no. 



3 

4 
5 
6 
7 



10 



Mean 



100 ng 



standard 
nn^n deviation 



CV 



18.24 

18.23 

10.33 

18.33 

18.35 

18.^4 

18.3 

ltt.3 

1SM2 

18,15 

18.23 

18.32 

18.4 

18.38 

18.46 

18.54 

18.67 

19 

18.28 

18.36 

18.S2 

18.45 

16,7 

18.73 

18.18 

1834 

18.26 

18.42 

18.57 

1 8.66 

0 io) 



13.27 0.0$ 
0,06 

18.34 0.07 

18.23 0.08 

18.42 0.04 

18.74 0.24 

18.39 0.12 

18.63 0.16 

18.29 0.1 

18.55 0.12 

18,42 0.17 



0.32 

0.3? 

0.36 

0.46 

0,23 

1.26 

0.66 

0.B3 

0..4A 

0.6S 
0,90 



25 ng 



20.48 

20.55 

20,5 

20.61 

20.59 

20.41 

20.54 

20.6 

20.49 

20,48 

20.44 

20.38 

20.68 

20.87 

20,63 

21,09 

21.04 

21.04 

20.67 

20.73 

20.65 

20.98 

20.84 

20.75 

20.46 

20.54 

20.48 

20.79 

20.78 

20.62 



standard 
mean deviation 



20,51 0.03 



20.54 



20.43 



20.73 



20.6R 

20.86 

20.51 

20.73 
20.66 



0.11 



20.54 0.06 



O.05 



0.13 



21.06 0.03 



O.04 

0.12 

0.07 

0.1 
0.19 



cv 

0.17 

0.54 

0,28 

0.26 

0.61 

0.15 

0.2 

0.57 

0.32 

0.46 
0,94 



(or containing a partial cDNA for human factor 
vn I, pl : 8TM. A scries of transections was sot 
up vising a decreasing amount of die plasmid\40, 
4, 0.5, and O.I (xg), Tw wiiy -four hours pnst- 
■trtuutfet'tinn, total DNA w<*s puriflfd from each 
fiask uf crib. p-Aclin ^nic quantity Witt chosen as 
a value foi* normali/^l iwn or ^uiumU*. DNA con- 
centration from each sample, in this cxpeiiineut, 
p-actin gene content should remain constani 
relative to total genomic DNA. Hgun*3 shows the 
result of the p-actin DNA measurement (100 n& 
total DNA determined by ultraviolet spectros- 
copy) Ot each sample. Kach sample was analyzed 
in triplicate and the mean |i-actm Cy values of 
the triplicates were plotted (error bars represent 

rt-**i<it>M riwi^jinni 1 h#» blotter <1ifforoncr 



between any two s&mpta moans was 0.95 C,- Ten 
nanograms of totyl DNA of each sample were aUo 
examined for fl-aclln. The results «&tuu showed 
that very similar amounts of genomic DNA were 
present; the maximum mean |i act in C, value 
difference wa.s 1.0. As l<xgurc 3 shows, the rule of 
p-actin C,. eljauKv Ixriwocn the 100 and 10-ng 
sajrmle* was similar (sioj>e values r;mg« bwtwoen 
3.56 and - 3.45). Thix verifies again thai ihci 
method of sample preparation yields s\TrripU«; of 
identical PCR integrity (i,e~, no sample contained 
an excessive ainuuut of a ?CR inhibitor). How- 
ever, these results Indicate that each sample con 
talncd slight diffidences in the actual amount of 
genomic DNA analysed. Determination of actual 
KeuuiJiic DNA concentration was accomplished 
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Figure 2 Sample preparation purity. The repticaro 
camples shown In Tab!© 1 woro also amplified In 
tripicate vising 25 ng of each DNA sample. The fig* 
uifc sl'iowi die input DNA concentration (TOO and 
2$ ng) vs. C, In ihr* tinurp, ihf* 1O0 and 75 ng 
points for each sample ana connected by a line. 



by plotting the mean (J-actio value obtained 
for uacti 1(H)* tig sample un a p-actln standard 
ve (shown in J'Sg- 40). The actual genomic 
ONA concentration of each sum pit-., et t was ob 
tallied by extrapolation to thu X-axii. 

Figure 4 A shows the measured (I.e., nun- 
normalised) quantities of /actor VJJJ plaamid 
ONA (preTM) from each' of the four transient cell 
transections. Each reaction contained 100 tiff of 
total sample ONA (as determined by UV spectre*** 
copy). VacU sample was analyzed in tri plicate 



a 



25- 



pa 



23 



21 



20 



pfeTMlmnsfaotod 



0.8 




V • 27.73 '•tfM-rll.l 
y» 1 -3.58X Pel 



i' 



i.4 i-a i.e 

log (ng Input ONA) 

Figure 3 Analy>h of tiansfectcd cell DNA quantity 
and purity. I lie DNA preparations of the lour 293 
cell transections (40, 4, 0.5, and 0.1 n9 of pF8TM) 
were analyzed for the 0-actln gene. 1 00 and 1 0 ng 
(determined by ultraviolet spectroscopy) of each 
sample were amplified in triplicate. For each 
amount of pf 8TM that was transfected, the (5-aciln 
C T values are plotted versus the total Input DNA 



PC'.rt ;rt n pllfi cations. As shown, pl i 'l3TM purified 
jfitiic Jbe 29H cells decreases (mean C, values in- 
vru:i*r*) with decreasing amounts of pi asm id 
arumli'Ued. The mean C t values obtained for 
pr-VTM iii Tlgure 4A were plotted on a standard 
curve comprised of set hilly diluted pFHTM, 
shown .in figure 4R. Tito quantity oJ pl>KTM, h, 
found in each of the four tranfifectiotiR was do- 
tcrmined by extrapolation to the x ax It; of the 
standard curve In l'igurc 411. These, uncorrected 
values, b, for pFKTM were normall^id ia deter- 
mine Uie actual amount of pl'8TM found per 100 
rin of genomic DNA by using the equation:. 



l> x 100 ufj 
7l 



uetual p[ ; fVI"M copies i>er 
100 ng of genomic DNA 



where a •- actual genomic ONA in u -sample and 
U i- pFflXM copies from the standard curve. The 
normalised quantity of plOTM per 100 ng of ge- 
nomic ONA for each of the four transection* is 
shown In Figure 4JJ, These roulh show Uiai ihc 
quantity of factor Vill plasuuU associated wiiti 
tnc 293 cells, 21 lir after transection, di:u ease;* 
with decreasing pJw^nild uiiii.eoLiaijoii used in 
the transection, "Hie quantity of pl-BTM associ- 
ated with. 293 cells, after transfectloji with 40 
Of plMKinid, was 35 pg p^r 100 ng genomic ONA. 
This results in -520 plasuiid copies per tvJl, 



DISCUSSION 

We have described a new method for quantis- 
ing gene copy numbers using fcaMlmc analysis' 
of PCK axnpUficatlam, Real-time PCK is compat- 
ible with cither of the two FOR (KT-PCR) ap- 
proache>: (1) quantitative competitive where an 
Jnteuial ccimpclitcn' for each target sequence is 
used for normalixadon (data not shown) or (2) 
quantitative comparative l'OK using a m« out liga- 
tion gene contained within the sample (i.e., |3-ac- 
tin) ox a "housekeeping" gene for RT-PCK. Tf 
equal amounts of nucleic acid are analyzed for 
each sample and if the ampliflcaUun cffkic:ru-.y 
before' quantitative .analyst* identical for ench 
sample, the internal cunltol (nuj-mali^tiou gene- 
t)r competitor) should j;lve equal for all 

SHItiplCS. 

Tlie real-time ?Ol method <;ffcrs several ad- 
vantages over the other two methods currently 
employed (see the Introduction). l-'irst, the real- 
time PCR method Is perfonncd in a dosed-tube 
system and requires no post-PCIR manipulation 
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Flgnrft A Quantitative- ftnolyti* of pF8TM in transfccted cell*. (A) Amount of 
plasmid DMA uicd (or (he trunsfectlon plotted against tlie mean C, value deter- 
mined for pfSTM remaining ?A br alter transfection. (D,Q Standard curvns of 
pPftTM and f&-actfn, respectively. pfQTM DNA <fl) and genomic. ONA (Q were 
dilutftd £&rially 1 :5> before amplification with the appropriate primeri. The fi-actin 
standard curve wa* usod to nornwli>c the results of /Uo 100 ng of genomic DNA. 
(0) The amount of pPSTM present pw 1 00 n<j of genomic DNA, 



of sample Therefore, iht* potential for TCR con- 
tamination in the laboratory is reduced because 
amplified products can lu» ^ualy/ed and disposed 
of without opening the rtuction tubes. Second, 
this method suppojU the UMt of a tiorm;iliy.<itk>f j 
gene (La, fj-aetin) for quantitative PCR or house- 
keeping genes for quantitative RT-l'CK controls. 
Analysis is performed in real time during the Jog 
phase of product accumulation. Analysis during 
h>K phase permits many different genes (over a 
wide input target range) lo be analyzed simulta- 
neously, without concern of reaching reaction 
plateau at different cycle*. This will make uiuHI- 
jgen* analysis assays much caMe.i l\j develop, be- 
cause individual internal i.umpetUoi> will nut be 
necded for each gene under analyst Third, 
sample throughput will utnease di aniulictfliy 
with the new method be cause there is no jwist- 
VCMI prnc casing time. Additionally, winking In a 
*J6-wcll format Is highly compatible with auto« 
iiiiition technology. 

The real-time PCR method is highly reprn. 
ducible. Re pi lea i« amplifications can be analyzed 



for c-ach sample minimising jKMcmiul error. The. 
system fillows' for a very large assay dynamic 
runge (upproaehing 1,000,000- fold Starting Ui- 
gel). Ualng u .standard curve for the target oi in- 
terest, reJutJve copy number values can be deter- 
mined for any unknown sample. Fluorescent 
threshold values, O r , co-ueJair. linearly with rela- 
tive DNA copy numbers. tteat time quantitative 
KT-I'CK methodology (O'lbson et al., Oils issue) 
has also been developed, filially, real time quan- 
titative 1*CU methodology can be used u> develop 
high-throughput weening assays for «i variety of 
applications [quantitative gene CAf'iea&iuu (KT- 
rCR), gene copy na.tuys <Mer2, IJ1V, etc*), geno- 
typing (knockout mouse, analysis), and Immuijo- 

renj. 

Real-time POU may alao l>e performed using 
intercalating dyes (Higuc.hi ct aL such as 

ci.hJdium bromide. The fluorogenie probe 
method offers a major advantage over in ter- 
ra la ting dyes- -greater specificity (i.e., primer 
di triers and noo sped fie PCR products are not de- 
tmed). 
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METHODS 

Generation of a Plasmid Containing & Partial 
cDNA Tor Human Factor Ylll 

Total KNA n<n harvested (UNAw» » C^m* T«) Tew, hic v 
hncitdsvx-ood, TX) from ii<uiarcclcd with a factor VI U 
expeessluu vuclar, pClSZ-ttvaSI) (Katun vt td.* 1VH6; Car* 
man cl al. 19<K». A factor VIII partial cIlNA wvpwruv Wtt* 
r/ t»€.rn^a by in* l»f :H 'lC.VneAnip YJ. tTlh KNA Kit 

(pan NW)K-ov/y, i*b Applied hiosysicros, Postvi cJity, t;A)J 

using tlic Pent piiuicfi PKfor mul P*Rrev (|>rin«-f sequence* 
arc shown below). The ampttccui was reamplified Oiling 
modified i'flfor and Wrcv primers (appended with towlll 
and //rirclUl restriction she sequences «t Uiv V ci«U 
Cloned -Into piiKM- 3Z (Prorncga Corp.. MudLion, WI). The 
resulting ci«nr, pWM. was used tor transient iransfcnlon 
oJ" cells. 

Amplification of Target DNA ami Pctecilon of 
Amplicon Factor VIII PUsmld.DNA 

(pPSTM) was vinpllfttnl with llitr |fti»wi-s VHUn S'-OCXl-. 

arc k ;<;AAci au:j tj ai x i i on*o-3* and Ptsrev .v-aaa< ;<rr- 

tMCCXn*GGA*IX«j rACiG-a'.llw rvttvUou produced tt AM** 
up i'OK product. Tlic fnrwurd primer w«:» downed to icy 
ognl/.c u imlquf M-ipivme fuund In the 5' untranslated 
region of tliu p<ti trill pC13Z.tSi.Z51> pi wind it ml- therefore 
dues jiol H'WJgnUu <utd amplify ihv human foetor VI 11 
gem\ Primnr* woro ehoson with the awifUmi*' of I U<- com- 
puter program OH go 1«H (Nwtiwiwl lliiwcionecs, hu\» Ply* 
mouth, MN). The human p-»ctl« gene was amplified with 
Ike primer* fj-iK-lin forward primer £ 'TCACCOACAt 7YI IT 
GCCCAT(7PACX';A-.V and (J-acliu reverse piimer .S'.CAC- 

Cc.*CAACc:C;crr<;A!Tc;<x*AA'j'GG-3*. The reaction pro- 
aucco a 295- rip rCit product. 

A m pi l flea i ton reactions (SO -pj) com Mined a i)NA 
sample, Klx PCIt liuffr.r II (6 p\) t 200 pM dATP, dCTl\ 
dCTl\ and 400 p-m riUTP, 4 inu Mg<:i ?t Unhs Ampll 
7Vk; t;na poiymcia^c, u.5 unit Ampwnsc uracil N-Rly- 

i:ti.iyluM.- (UNO), SO pmok- of cftch factoi Vlll jiflnici/ unci 15 
ptimh* <if vtudi p otcilrt pilmw. Tho u-at llui^ htv<i i»>nlatncd 
One Of the foMo w *»R UPlCClhin prohos (UK) nu rprh)* 

A'<i'AW)Ac:crrcri'c:cu(:<rr<iC.-n , (rrrr<n'C3 , i- 

GCCTT(TAMRA )p a'iiud p-«ctiu probv'5 r (FAM)ATGC»X^ 
X(TAMRA)CCCCCATCCCATCr>-.T whrrr p indicates 
phn*phnryl Alton nnd X indicMcs a linker arm nucleotide. 
RcacUon 1u1k*» w«rt« Mif:n?Afi\p Optical T«l>cs (part AUm- 
her NK01 l»crkln LUmex) that wor<» froeUH.1 (Ml IHtrfchi 

lUrucr) to |»rcv<-nl Ughi from /cflccllHf;, Tube caj>i Were 
ilmil,iv in MitToAmp c;nj)a hut specially designed ic pro- 
ychI light scattering. All <»t \\t<* \K'M i^»iflt*mfilvtf* wero »o>»- 
r licd Ivy PK Applied Uiotfy*trma {IN»*it*r C"!hy, CA) except 
the fuctlor VIU prhutrra, wjiivh woie xyothcsl/cd ul Cenvn 
tech, hie, (Suulh Pfttii clsco, CA). Prohev tlc*sl^nfd 
using Uic Oljftr.» 4X) .no f I ware, following guidelines kuk* 

iiCMea in tnc Model 7 70f) .sequence Dettvtor ht.nuuiiiefil 
manual. Urlcfly, probe *l' m ilintilit he At least 5 U C 1) If' her 
inan the an aval lux uruipvMturc u.ietl during thrrmul ey- 
rhtig; primers should not fuiin nUIjIv duplexed wilh the 
probe. 

The ihenijfll cycling cuiiditHnu Included 1 juln at 
50 V C ^nd 10 niin al 95"C. 'Hiej-utal cycling prnrertJcd with 



reactions were pcrfonnvd in the Modol 77(H) Sequence IV- 
U*iA<>t (J»U Applied Utusysiuiiiv), uihlrh couuhn » Ccne- 
AlMp »'<":U Sy&ttfm P«>0. Inaction wudittoi^ wi rr* pr«. 
RruiimicU on .» I'wwvf Macintf»h 710(1 (Apple Computer, 
Soma Clara, t^\) linked difvcily to the Mode! 7V0o Se- 
tpicjtev JXdeclor* Anaty*l« «f dala w>s aUu perfnfm#*d nn 
lUv WHflntr«h computer. Olloeiton and wtalydK a>flw.*rc 
wit» develn|wl n\ PK Applied BlcKyniums. 



Traiufection of Cells with Factor VIII Construct 

Vnur T\7S flasksr of 293 cells (ATCC C:R1. 157:t), n human 
fetal kidney suspen^on cell line, wvre %mwn to B0% con- 
lUicnvy and tranifctied plWM. Cell* were k« >w » l'» 
follnwlng media; SCW HAM^ ¥\2 withaul GUT, 50% ltn«o 
glucose jHtfhceeo's m<KhfJrd V&%\q malium (l)MKM) wltli- 
otn glynruj wiUi flodlum bicarbtwwto, 10% ictal Ixmne- 
serum, 2 him L-glul<«rtint, And 1% pcnkillin-stroptomy* 
^|n, Tho media vyoi vlivnf^crd 30 mln ixrh*rv the iransfee 
lion, plWM WA amount* of 40, 4, OS, and 0.1 m; v/w 
iidilcfl iu 1..S ml of a solution containing 0,t25 m CUia^' 
And 1 x MUI'US. The four mUhm-i were left at room tem- 

l^crntwn- f<»i 1U mln and ihcti ;»ddi**1 dwpwUc> to U»o <r*;llar. 
Tliv fl«>)i> • UiwiUiled at 37°C:'and 5% CO. for 24 hr, 
washed with PUtf, a nil r<t<uapcndcd In PUS. 'Hie U'hhs 
jn-mUd eclb were divided intv »lupsot» und UNA wm w- 
tnuited hiiiticdtulcly nxing IhvQlAainp Wi"«l Kit (Qi^^en. 
OiaUimnlh, </«A), l>NA wcis elided Into 200 ul 30 i»u. 
TrMiaulpIltt.(K 



ACKNOWLEDGMENTS 

We thank Cienen lech's DNA Synthesis <:ruup for prlmrr 
synthesis and tieneidech's Graphie* Croup for OSSlsionec 
with the J inure* 

The puDJKntlnn ccwrx of this article were defrayed In 
' pnrl »>y pjtymenl of pas* charge* This article must there- 
fore he hcrehy marked "advertleement" in ace.ortUiue 
with iH IW! s«dl<ui Y/U %o\Q\y lo Indicate This tad. 



REFERENCES 

H,issler, 11-A., S.J. Fk»od, K.J. Mviik, J. Manaaro, K. Ki.cj.., 

and c:.a. watt. ivW. or a nuorogeme proln* in a 
rCU-uased ussay tor me oeiernon of IJsterla 
nninocyiogenes, App. E«w/e». MfaubluL 61s ,172'4-%172Qt 

llixkci-AndrCi M, IWt. Quantitative t-vnlunlfon of 
UiKNA luvtflb, Wef/r. Mt//. r;i//. 2; ?^01. 

Ucinrntli M., S. Menw, P.--IWigii(*n*IU f A. Maitsdn, A. 
VmN-h^, ^ud P.R- Vurtildo. 1*1U1. ilti4,.,lilutwc. l*CR and 
UT.l'C'Jt In virology. |fccvU.w|. PCll UcUuxk AfplUu 

^)ituor, ILL, H. Molul, V. Ca<i, and 1X1). Jlo, 1003. 
Increased v)r«l hurden and cytoputh icily Corf^latO 
lcnipor«Uy wills CD A * T-lympliwylc dftdino and 
illnkul proK,"^*"" human Imnwniuleficiancj' vims 
tyiw 1-hifcLtcd indivivUds. A ViroL 67: \ ir^V/T;. 

Vjiton. Vt.U WX Wood, D.'Eaton. 1M'- Mass, 1*. 



From : BnL 



PHONE No. 



310 472 0905 



Dec. 05 2002 12:2GAM P19 



HF1D LI AL 

Venar, *nd C« <ion»uu, 1!>86. cXmAUuctUm am\ 
ritaracwrUatlcm of an m iWc factor VH1 variant lacking 
Ihc cent ml nno I bird of ihc molecule, nUrlsttitiStry 
25; 8343-KW. 

Masco, MJ., CP. Trcanor. 5. .SjriVack, II.U t'tggc, and 1J5. 
Kaminsky. 1995- Quantitative KNA-polymcw **Haln 
rcaction-UNA analysis by cwplH^ry vlvUioptwwrtU and 
law-litduccri flnorcsccncv AnaL HUh'IkM, 2^*' Uh-147. 

Verro, I'. 1992. Quautiiativonr swiri-guwolUJlivc l*:R: 
Kttallty versus mytln ^CK Appilc. 2; 1-9, 

Vurl»rf<>, MJL. l^A. Xiue*li«y. A""* Wollnsky, 1995. 
Omiiko »«< ihc vital inRNA rwprr*.don pattern correlate 
with a rapid rate of CD4 4 "JVcvli number rierllw In 
human iromunodaflcloncy virus typo Mufti-tun* 
individual*, A Ytfti. *to 7sm*j.im. 

CiWm U.E.M , C.A. »*ul Williams. \W6 A 

novrl method for rwd limo q«»tit««*tlw competitive 
RT-IK:iL GenonwHes. <thi* Utuc). 

Oiirmwi. CLM-. ".k. Glcs, H!)d t:. McCray. 1990, 
Ti.in.dcm production of proteins nsing an adenovirus 
Ufliwfcifmcd cell line. i)NA ProU £n$in- "'Wii. 2: 3 10, 

lueuelil, K., Wtilliiigcr, P5- W»UU, «»d II. Otffilh. 
lyHZ. Simultaneous nniplificiatlcMi and dotation of 
specific DNA MujuctKcs. l\iot<tluwltw IOi 1 1 3 AW, 

Holland, J'-M_, R.D. Atnftinaoit, H, WHjou, and I>.lh 
CicKwud. 109]. J Election of «|H'«jflc polynwr.w chain 
reaction product Oy vtillxlng Our 5* — V exunuckw 
ainlvliy of Tfi«nnu» ttv|im(U.tiA UNA polymcrAic. JW, 

N/ir/. >L-iw/. St i. Sft: 72?u-72t.U>. 

j 

Huang. S.K. # H.Q. Xiao, TJ. Kielne, t.;. raooiu, li.G. 
Mwsh, L.M. Uchienstcin, and M.C Uu- l99Aa. 1U13 
expression at the .sites ot allergen challengi' In pattern.* 
with asthma. A Itmnwu 155: Z68R-2694* 

Huang, M. Yl, E. Miner, and JU!. Marsh. 199SU. A 
dominant T cell receptor bct*<haln in response to a 
Short ragweed allergen, Awb a S. ). Ummm* 

t 

KcIlopR, DX„ JJ. Snlatky, Jind S. Kowfc. 1990. 
QufUiilUtion Oi niV-l jJTmHnil PNA rnlaiiv^ 10 COlltiUr 
UNA by the polymcxaso chain reaction. AntiL RtacUctn. 

Lee. 1-C'. f C.R. Council, and W, Woch. 1903. Allelic 
dlsaltnlJiaUou by nlrk-lranslttlknj )»CllwlUi fluorogonlc 
probes. NuctcJc Acids Kca. ZU M6\-?>7GC- 

Uvak. KJ-, S0» Hlood, J. Mannaro, w. liiusti, and K. 
Dectz. 19954. OUgonudi-oikUrS with fluorcjiCttnl dyes *v 
opptwilo ends provide a qutfuCftcd probe systwn uvrful 
for di tccUug l*CR pniduci i\nd nnrldr ac id 
hybridation. ^» MrfWt * 357 362. 

Uvait. K..J., J. MarniJ»nj f and J- A- Todd. 100Sb. Toward* 



h»Uy o«iomatc*1 g^nAmo.vuittf |>olymori)liJsi» screw lina 
iter! Nature Cmct. 9t M 1 • 3 4 

Widdcr. J. ( N. Mv-Kinncy, t :. tJinsto|ihcrwn J. Siilnsky, 
u v;i«nfic.id, and t; r Kwuk. i<><m. lupid »id «(mplc 1^J{ 

ot*»>- fv>r qujtntlUdon «f human Immmnvloflclrncy vims 
lypr I UNA in plasma: Application to aaitc retroviral 
infection. ) Clin, Mirmbht. 32: 292- 300. 

J'ain;. S.. Y. Koyunagl. S. C« WUoy, 1J.Y. VJnicra, 

(iud 1J?. Chen. )«> l >0. 1 llfth Jrvnla of unln1CB«»fTi H1V.1 
DNA in briiin nssiic of AtlXS dementia pauViii*. Satutr 

I'UUK, M.j., l.uk, 1». Wniij«i» r <md J.n. Llfonn. 

iyy/a. ^uantitailvp anniwtillvi* polymerase, chain 
rcaiiH'/n lor accurnii: quantitation »t mv DNA <»id RNA 
sj>cclcs. MWrrhmtfUes 14; 70-Ki- 

Pidiak, M^.. M ^ i-f « Yang, SJ. CUrK. J.t:. Knppos/ 

JCC*. l-UK, U.H. Huliti, CV.M. and J.U. Uf>Uii, ) 993b. 

High levels oi hiv-l in plasjna. aurmg ail sta W v» nf 
inteciion dctcrmniwl t/y eoiupeTllive k:h [,hcu 
CviinncntM. AVf<awr 2-^9: 1749-175^1. 

I'lud'huiiimv, tij., Kono. and AJS, ThCOfllniKiuhis. 

Quant tt alive jiolyiiicnixc Chain ICOCllon analysts 
TWf*K marked uvcrcNpri«$mn nf interlcukin- 1 bcti», 
iiucncUKln-l and lnlcrfcrwi'gniiini« inRNA In the lymph 
nodw of lu|m>-pnnif mil*-. M&L Uiimuiwh 3^: -t0i-£03. 

Kiicymackcrr., I.. 1S»9A. A commcntevry mi iho prpcdcal 
djiplUallon.! at cwiii|/cllUve i^:ll- <7 4 v/w«<r ^» «1 04. 

Sltdip, P.A.. A.J. ncrk. and S.M. llci-ROt, 19HO- 
Tran4cri))tinn niujNS of ftd*novlru«. Mtfhott* KnxytttaL 

Sluinan, PJ., (i.M- Clark. S.C, Wnng, WJ. U-vin, a. 
Ullrich, and W.K Mi^iulre. J9H7. Human breast tuiiiwrj 
Correlation nl-wlapse and survival wlUi amphficuilinil of 
the 1 WR-2/neu oncogene, .fc^ncr 235: 1 7'/- 1S2. 

Soudicni, J-l-W. 19'AS. tMwlion of specific wquurax^ 
among ONA fragments separated by $p\ dccirophorcsis. 
I MoL li\oU 98;M).V517. 

Tan, X., X. 5un. uk Cionzoiez. and W. tbuvl'. iwm. 
and TNi r IT) crease Hie ^fci.ursor of Ni'-Wappa H ji£0 
mhNA in moesc Inicslluvr Q«*nllUtlv<* analysis by 
CO)UlM:t»lvc I'CK. JSH/chitfh btophy*. Atff 1215: 157 l/#2. 

•llioinas, P»S. IVWK Hyt>ridly.«tmn ol tkt»aujrcd ItrVA ind 
small L>m fiagrncnw trmiafcrrcd nhrocKOluloio. I*rfir. 
Natl. Acntl. S<L 77\ 5201-5205. 

Williams, S., ScV.wci', A. Krlshnarao, C Held, 11. 
KarKCT, and r.M. Wlliiainj. 1996. Quantitutivo 
comtKihive i-cii; A««»r*l* of »'«P lifi *< J P«od«ctt of the 
H1V-1 ror £fn»c by ^pillary eJcctroj^horests wtlvlawr 
induced nuore,c^ncc detection- Anal BioeMnu (in press). 



W>vd /kmc 3, 199o/ «^r^ Jn r^vlvcJ form Juiy 29, 
1 996. 



Proc. Natl. Acad. Sci. USA 

Vol. 95, pp. 14717-14722, December 1998 

Cell Biology, Medical Sciences. 

WISP genes are members of the connective tissue growth factor 
family that are up-regulated in Wnt-l-transformed cells and 
aberrantly expressed in human colon tumors 

Diane Pennica*^ Todd A. Swanson*, James W. Welsh*, Margaret A. Roy*, David A. Lawrence*, 
James Lee*, Jennifer Brush*, Lisa A. Taneyhill§, Bethanne Deuel*, Michael Lew^, Colin WatanabeII, 
Robert L. Cohen*, Mona F. Melhem**, Gene G. Finley**, Phil Quirke**, Audrey D. Goddard*, 
Kenneth J. Hillan 11 , Austin L. Gurney*, David Botstein****, and Arnold J. Levine§ 

Departments of *MolecuIar Oncology, ^Molecular Biology, •Scientific Computing, and , Pathology, Genentech Inc., 1 DNA Way, South San Francisco, CA 94080; 
••University of Pittsburgh School of Medicine, Veterans Administration Medical Center, Pittsburgh, PA 15240; ^University of Leeds, Leeds, LS29JT* United 
Kingdom; ^Department of Genetics, Stanford University, Palo Alto, CA 94305; and 8 Department of Molecular Biology, Princeton University, Princeton, NJ 
08544 



Contributed by David Botstein and Arnold J. Levine, October 21, 1998 

ABSTRACT Wnt family members are critical to many 
developmental processes, and components of the Wnt signal- 
ing pathway have been linked to tumorigenesis in familial and 
sporadic colon carcinomas. Here we report the identification 
of two genes, WISP-l and WISP-2, that are up-regulated in the 
mouse mammary epithelial cell line C57MG transformed by 
Wnt-1, but not by YVnt-4. Together with a third related gene, 
WISPS, these proteins define a subfamily of the connective 
tissue growth factor family. Two distinct systems demon- 
strated WISP induction to be associated with the expression of 
Wnt-1. These included (i) C57MG cells infected with a Wnt-1 
retroviral vector or expressing Wnt-1 under the control of a 
tetracylihe repress ible promoter, and (ii) Wnt-1 transgenic 
mice. The WISP-l gene was localized to human chromosome 
8q24.1-8q24J. WISP-l genomic DNA was amplified in colon 
cancer cell lines and in human colon tumors and its RNA 
overexpressed (2- to >30-fold) in 84% of the tumors examined 
compared with patient-matched normal mucosa. WISPS 
mapped to chromosome 6q22-6q23 and also was overex- 
pressed (4- to >40-fold) in 63% of the colon tumors analyzed. 
In contrast, WISPS mapped to human chromosome 20ql2- 
20ql3 and its DNA was amplified, but RNA expression was 
reduced (2- to >30-fold) in 79% of the tumors. These results 
suggest that the WISP genes may be downstream of Wnt-1 
signaling and that aberrant levels of WISP expression in colon 
cancer may play a role in colon tumorigenesis. 



Wnt-1 is a member of an expanding family of cysteine-rich, 
glycosylated signaling proteins that mediate diverse develop- 
mental processes such as the control of cell proliferation, 
adhesion, ceil polarity, and the establishment of cell fates (1, 
2). Wnt-1 originally was identified as an oncogene activated by 
the insertion of mouse mammary tumor virus in virus-induced 
mammary adenocarcinomas (3, 4). Although Wnt-1 is not 
expressed in the normal mammary gland, expression of Wnt-1 
in transgenic mice causes mammary tumors (5). 

In mammalian cells, Wnt family members initiate signaling 
by binding to the seven-transmembrane spanning Frizzled 
receptors and recruiting the cytoplasmic protein Dishevelled 
(Dsh) to the cell membrane (1, 2, 6). Dsh then inhibits the 
kinase activity of the normally constitutively active glycogen 
synthase kinase-3/3 (GSK-3/3) resulting in an increase in 
/3-catenin levels. Stabilized j3-catenin interacts with the tran- 
scription factor TCF/Lefl, forming a complex that appears in 
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the nucleus and- binds TCF/Lefl target DNA elements to 
activate transcription (7, 8). Other experiments suggest that 
the adenomatous polyposis coli (APC) tumor suppressor gene 
also plays an important role in Wnt signaling by regulating 
/3-catenin levels (9). APC is phosphorylated by GSK-3/3, binds 
to /3-catenin, and facilitates its degradation. Mutations in 
either APC or 0-catenin have been associated with colon 
carcinomas and melanomas, suggesting these mutations con- 
tribute to the development of these types of cancer, implicating 
the Wnt pathway in tumorigenesis (1). 

Although much has been learned about the Wnt signaling 
pathway over the past several years, only a few of the tran- 
scriptionally activated downstream components activated by 
Wnt have been characterized. Those that have been described 
cannot account for all of the diverse functions attributed to 
Wnt signaling. Among the candidate Wnt target genes are 
those encoding the nodal-related 3 gene, Xnr3, a member of 
the transforming growth factor (TGF)-0 superfamily, and the 
homeobox genes, engrailed, goosecoid, twin (Xtwn), andsiamois 
(2). A recent report also identifies c-myc as a target gene of the 
Wnt signaling pathway (10). 

To identify additional downstream genes in the Wnt signal- 
ing pathway that are relevant to the transformed cell pheno- 
type, we used a PCR-based cDNA subtraction strategy, sup- 
pression subtractive hybridization (SSH) (11), using RNA 
isolated from C57MG mouse mammary epithelial cells and 
C57MG cells stably transformed by a Wnt-1 retrovirus. Over- 
expression of Wnt-1 in this cell line is sufficient to induce a 
partially transformed phenotype, characterized by elongated 
and refractile cells that lose contact inhibition and form a 
multilayered array (12, 13). We reasoned that genes differen- 
tially expressed between these two ceil lines might contribute 
to the transformed phenotype. 

In this paper, we describe the cloning and characterization 
of two genes up-regulated in Wnt-1 transformed cells, WISP-l 
and WISP-2, and a third related gene, WISPS. The WISP genes 
are members of the CCN family of growth factors, which 
includes connective tissue growth factor (CTGF), Cyr61, and 
nov, a family not previously linked to Wnt signaling. 

MATERIALS AND METHODS 

SSH. SSH was performed by using the PCR-Select cDNA 
. Subtraction Kit (CLONTECH). Tester double-stranded 

Abbreviations: TGF, transforming growth factor, CTGF, connective 
tissue growth factor; SSH, suppression subtractive hybridization; 
VWC, von Willebrand factor type C module. 
Data deposition: The sequences reported in this paper have been 
deposited in the Genbank database (accession nos. AF100777, 
AF100778, AF100779, AF10O78O, and AF100781). 
tTo whom reprint requests should be addressed.'e-mail: diane@gene. 
com. 
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cDNA was synthesized from 2 u,g of poly(A)^ RNA isolated 
from the C57MG/Wnt-1 cell line and driver cDNA from 2 u,g 
of poly(A) + RNA from the parent C57MG cells. The sub- 
tracted cDNA library was subcloned into a pGEM-T vector for 
further analysis. 

cDNA Library Screening. Clones encoding full-length 
mouse WISP-1 were isolated by screening a AgtlO mouse 
embryo cDNA library (CLONTECH) with a 70-bp probe from 
the original partial clone 568 sequence corresponding to amino 
acids 128-169. Clones encoding full-length human WISP-1 
were isolated by screening AgtlO lung and fetal kidney cDNA 
libraries with the same probe at low stringency. Clones en- 
. coding full-length mouse and human WISP -2 were isolated by 
screening a C57MG/Wnt-1 or human fetal lung cDNA library 
with a probe corresponding to nucleotides 1463-1512. Full- 
length cDNAs encoding WlSP-3 were cloned from human 
bone marrow and fetal kidney libraries. 

Expression of Human WISP RNA. PCR amplification of 
first-strand cDNA was performed with human Multiple Tissue 
cDNA panels (CLONTECH) and 300 jiM of each dNTP at 
94°C for 1 sec, 62°C for 30 sec, 72°C for 1 min, for 22-32 cycles. 
WISP and gIyceraldehyde-3-phosphate dehydrogenase primer 
sequences are available on request. 

In Situ Hybridization. 33 P-labeled sense and antisense ribo- 
probes were transcribed from an 897-bp PCR product corre- 
sponding to nucleotides 601-1440 of mouse TOP-/ or a 
294-bp PCR product corresponding to nucleotides 82-375 of 
mouse WlSP-2. All tissues were processed as described (40). 

Radiation Hybrid Mapping. Genomic DNA from each 
hybrid in the Stanford G3 and Genebridge4 Radiation Hybrid 
Panels (Research Genetics, Huntsville, AL) and human and 
hamster control DNAs were PCR-ampIified, and the results 
were submitted to the Stanford or Massachusetts Institute of 
Technology web servers. 

Cell Lines, Tumors, and Mucosa Specimens. Tissue speci- 
mens were obtained from the Department of Pathology (Uni- 
versity of Pittsburgh) for patients undergoing colon resection 
and from the University of Leeds, United Kingdom. Genomic 
DNA was isolated (Qiagen) from the pooled blood of 10 
normal human donors, surgical specimens, and the following 
ATCC human cell lines: SW480, COLO 320DM, HT-29, 
WiDr, and SW403 (colon adenocarcinomas), SW620 (lymph 
node metastasis, colon adenocarcinoma), HCT 116 (colon 
carcinoma), SK-CO-1 (colon adenocarcinoma, ascites), and 
HM7 (a variant of ATCC colon adenocarcinoma cell line LS 
174T). DNA concentration was determined by using Hoechst 
dye 33258 intercalation f luorimetry. Total RNA was prepared 
by homogenization in 7 M GuSCN followed by centrifugation 
over CsCl cushions or prepared by using RNAzol. 

Gene Amplification and RNA Expression Analysis. Relative 
gene amplification and RNA expression of WISPs and c-myc in 
the cell lines, colorectal tumors, and normal mucosa were 
determined by quantitative PCR, Gene-specific primers and 
fluorogenic probes (sequences available on request) were 
designed and used to amplify and quantitate the genes. The 
relative gene copy number was derived by using the formula 
2<Act) wn ere ACt represents the difference in amplification 
cycles required to detect the WISP genes in peripheral blood 
lymphocyte DNA compared with colon tumor DNA or colon 
tumor RNA compared with normal mucosal RNA. The 
6-method was used for calculation of the SE of the gene copy 
number or RNA expression level. The W/SP-specific signal was 
normalized to that of the gIyceraldehyde-3-phosphate dehy- 
drogenase housekeeping gene. All TaqMan assay reagents 
were obtained from Perkin-Elmer Applied Biosystems. 

RESULTS 

Isolation of WISP-1 and WISP-2 by SSH. To identify Wnt- 
1-inducible genes, we used the technique of SSH using the 
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mouse mammary epithelial cell line C5JMG and C57MG cells 
that stably express Wnt-1 (11). Candidate differentially ex- 
pressed cDNAs (1,384 total) were sequenced. Thirty-nine 
percent of the sequences matched known genes or homo- 
logues, 32% matched expressed sequence tags, and 29% had 
no match. To confirm that the transcript was differentially 
expressed, semiquantitative reverse transcription-PCR and 
Northern analysis were performed by using mRNA from the 
C57MG and C57MG/ Wnt-1 ceils. 

Two of the cDNAs, WISP-1 and WISP-2, were differentially 
expressed, being induced in the C57MG/ Wnt-1 cell line, but 
not in the parent C57MG cells or C57MG cells overexpressing 
Wnt-4 (Fig. 1 A and B). Wnt-4, unlike Wnt-1, does not induce 
the morphological transformation of C57MG cells and has no 
effect on 0-catenin levels (13, 14). Expression of WISP-1 was 
up-regulated approximately 3-fold in the C57MG/ Wnt-1 cell 
line and WISP-2 by approximately 5-fold by both Northern 
analysis and reverse transcription-PCR. 

An independent, but similar, system was used to examine 
WISP expression after Wnt-1 induction. C57MG cells express- 
ing the Wnt-1 gene under the control of a tetracycline- 
repressible promoter produce low amounts of Wnt-1 in the 
repressed state but show a strong induction of Wnt-1 mRNA 
and protein within 24 hr after tetracycline removal (8). The 
levels of Wnt-1 and WISP RNA isolated from these cells at 
various times after tetracycline removal were assessed by 
quantitative PCR. Strong induction of Wnt-1 mRNA was seen 
as early as 10 hr after tetracycline removal. Induction of WISP 
mRNA (2- to 6-fold) was seen at 48 and 72 hr (data not shown). 
These data support our previous observations that show that 
WISP induction is correlated with Wnt-1 expression. Because 
the induction is slow, occurring after approximately 48 hr, the 
induction of WISPs may be an indirect response to Wnt-1 
signaling. 

cDNA clones of human WISP-1 were isolated and the 
sequence compared with mouse WISP-1 . The cDNA sequences 
of mouse and human WISP-1 were 1,766 and 2,830 bp in length, 
respectively, and encode proteins of 367 aa, with predicted 
relative molecular masses of -40,000 (M t 40 K). Both have 
hydrophobic N-terminal signal sequences, 38 conserved cys- 
teine residues, and four potential N-linked glycosylation sites 
and are 84% identical (Fig. Z4). 

Full-length cDNA clones of mouse and human WISP-2 were 
1,734 and 1,293 bp in length, respectively, and encode proteins 
of 251 and 250 aa, respectively, with predicted relative molec- 
ular masses of ^27,000 (M r 21 K) (Fig. IB): Mouse and human 
' WISP-2 are 73% identical. Human WISP-2 has no potential 
N-linked glycosylation sites, and mouse WISP-2 has one at 
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Fig. 1. WISP-1 and WISP-2 are induced by Wnt-1, but not Wnt-4, 
expression in C57MG cells. Northern analysis of WISP-1 (A) and 
WISP-2 (B) expression in C57MG, C57MG/Wnt-1, and C57MG/ 
Wnt-4 cells. Poly(A) + RNA (2 /ig) was subjected to Northern blot 
analysis and hybridized with a 70-bp mouse WISP-1 -specific probe 
(amino acids 278-300) or a 190-bp WISP-2- specific probe (nucleotides 
1438-1627) in the 3' untranslated region. Blots were rehybridized with 
human 0-actin probe. 
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Fig. 2. Encoded amino acid sequence alignment of mouse and 
human WISP-1 (A) and mouse and human W7SP-2 (B). The potential 
signal sequence, insulin-like growth factor-binding protein (IGF-BP), 
VWC, thrombospondin (TSP), and C- terminal (CT) domains are 
underlined. 

position 197. WISP-2 has 28 cysteine residues that are con- 
served among the 38 cysteines found in WISP-1. 

Identification of WISPS, To search for related proteins, we 
screened expressed sequence tag (EST) databases with the 
WISP-1 protein sequence and identified several ESTs as 
potentially related sequences. We identified a homologous 
protein that we have called WISP-3. A fuli-length human 
WISP-3 cDNA of 1,371 bp was isolated corresponding to those 
ESTs that encode a 354 r aa protein with a predicted molecular 
mass of 39,293. WISP-3 has two potential N-linked glycosyl- 
ation sites and 36 cysteine residues. An alignment of the three 
" human WISP proteins shows that WISP-1 and WISP-3 are the 
most similar (42% identity), whereas WISP-2 has 37% identity 
with WISP-1 and 32% identity with WISP-3 (Fig. 3A). 

WISPs Are Homologous to the CTGF Family of Proteins. 
Human WISP-1, WISP-2, and WISP-3 are novel sequences; 
however, mouse WISP-1 is the same as the recently identified 
Elml gene. Elml is expressed in low, but not high, metastatic 
mouse melanoma cells, and suppresses the in vivo growth and 
metastatic potential of K-1735 mouse melanoma cells (15). 
Human and mouse WISP-2 are homologous to the recently 
described rat gene, rCop-1 (16). Significant homology (36- 
44%) was seen to the CCN family of growth factors. This family 
includes three members, CTGF, Cyr61, and the protoonco- 
gene nov. CTGF is a chemotactic and mitogen ic factor for 
fibroblasts that is implicated in wound healing and fibrotic 
disorders and is induced byTGF-/3 (17). Cyr61 is an extracel- 
lular matrix signaling molecule that promotes cell adhesion, 
proliferation, migration, angiogenesis, and tumor growth (18, 
19). nov (nephroblastoma overexpressed) is an immediate 
early gene associated with quiescence and found altered in 
Wilms tumors (20). The proteins of the CCN family share 
functional, but not sequence, similarity to Wnt-1. All are 
secreted, cysteine-rich heparin binding glycoproteins that as- 
sociate with the cell surface and extracellular matrix. 

WISP proteins exhibit the modular architecture of the CCN 
family, characterized by four conserved cysteine-rich domains 
(Fig. 3B) (21). The N-terminal domain, which includes the first- 
12 cysteine residues, contains a consensus sequence (GCGC- 
CXXC) conserved in most insulin-like growth factor (IGF)- 
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Fig. 3. (A) Encoded amino acid sequence alignment of human 
WISPs. The cysteine residues of WISP-1 and WISP-2 that are not 
present in WISP-3 are indicated with a dot. (B) Schematic represen- 
tation of the WISP proteins showing the domain structure and cysteine 
residues (vertical lines). The four cysteine residues in the VWC domain 
that are absent in WISP-3 are indicated with a dot. (C) Expression of 
WISP mRNA in human tissues. PCR was performed on human 
multiple-tissue cDNA panels (CLONTECH) from the indicated adult 
and fetal tissues. 

binding proteins (BP). This sequence is conserved in WISP-2 
and WISP-3, whereas WISP-1 has a glutamine in the third 
position instead of a glycine. CTGF recently has been shown 
to specifically bind IGF (22) and a truncated nov protein 
lacking the IGF-BP domain is oncogenic (23). The von Wil- 
lebrand factor type C module (VWC), also found in certain 
collagens and mucins, covers the next 10 cysteine residues, and 
is thought to participate in protein complex formation and 
oligomerization (24). The VWC domain of WISP-3 differs 
from all CCN family members described previously, in that it 
contains only six of the 10 cysteine residues (Fig. 3 A and B). 
A short variable region follows the VWC domain. The third 
module, the thrombospondin (TSP) domain is involved in 
binding to sulfated glycoconjugates and contains six cysteine 
residues and a conserved WSxCSxxCG motif first identified in 
thrombospondin (25). The C-terminal (CT) module contain- 
ing the remaining 10 cysteines is thought to be involved in 
dimerization and receptor binding (26). The CT domain is 
present in all CCN family members described, to date but is 
absent in WISP-2 (Fig. 3 A and B). The existence of a putative 
signal sequence and the absence of a transmembrane domain 
suggest that WISPs are secreted proteins, an observation 
supported by an analysis of their expression and secretion from 
mammalian cell and baculovirus cultures (data not shown).. 

Expression of WISP mRNA in Human Tissues. Tissue- 
specific expression of human WISPs was characterized by PCR 
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analysis on adult and fetal multiple tissue cDNA panels. 
WISP-1 expression was seen in the adult heart, kidney, lung, 
pancreas, placenta, ovary, small intestine, and spleen (Fig. 3C). 
Little or no expression was detected in the brain, liver, skeletal 
muscle, colon, peripheral blood leukocytes, prostate, testis, or 
thymus- WISP-2 had a more restricted tissue expression and 
was detected in adult skeletal muscle, colon, ovary, and fetal 
lung. Predominant expression of WISP-3 was seen in adult 
kidney and testis and fetal kidney. Lower levels of WlSP-3 
expression were detected in placenta, ovary, prostate, and 
small intestine. 

In Situ Localization of WISP-1 and WISP-2. Expression of 
WISP-1 and WISP-2 was assessed by in situ hybridization in 
mammary tumors from Wnt-1 transgenic mice. Strong expres- 
sion of WISP-1 was observed in stromal fibroblasts lying within 
the fibrovascular tumor stroma (Fig. 4 A-D). However, low- 
level WISP-1 expression also was observed focally within tumor 
cells (data not shown). No expression was observed in normal 
breast. Like WISP-1, WISP-2 expression also was seen in the 
tumor stroma in breast tumors from Wnt-1 transgenic animals 
(Fig. 4 E-H). However, WISP-2 expression in the stroma was 
in spindle-shaped cells adjacent to capillary vessels, whereas 




FiG. 4. (/4, C, E t and G) Representative hematoxylin/eosin-stained 
images from breast tumors in Wnt-1 transgenic mice. The correspond- 
ing dark-field images showing WISP-1 expression are shown in B and 
D. The tumor is a moderately well-differentiated adenocarcinoma 
showing evidence of adenoid cystic change. At low power (A and B), 
expression of WISP-1 is seen in the delicate branching fibrovascular 
tumor stroma (arrowhead). At higher magnification, expression is seen 
in the stromal(s) fibroblasts (C and 0), and tumor cells are negative. 
Focal expression of WISP-1, however, was observed in tumor cells in 
some areas. Images of WISP-2 expression are shown in E-H. At low 
power (£ and ^ expression of WISP-2 is seen in cells lying within the 
fibrovascular tumor stroma. At higher magnification, these cells 
appeared to be adjacent to capillary vessels whereas tumor cells are 
negative (G and H). 
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the predominant cell type expressing WISP-1 was the stromal 
fibroblasts. 

Chromosome Localization of the WISP Genes. The chro- 
mosomal location of the human WISP genes was determined 
by radiation hybrid mapping panels. WISP-1 is approximately 
3.48 cR from the meiotic marker AFM259xc5 [logarithm of 
odds (lod) score 16.31] on chromosome 8q24.1 to8q24.3, in the 
same region as the human locus of the novH family member 
(27) and roughly 4 Mbs distal to c-myc (28). Preliminary fine 
mapping indicates that WISP-1 is located near D8S1712 STS. 
WISP-2 is linked to the marker SHGC-33922 (lod = 1,000) on 
chromosome 20ql2-20ql3.1. Human WISP-3 mapped to chro- 
mosome 6q22-6q23 and is linked to the marker AFM211ze5 
(lod = 1,000). WISP-3 is approximately 18 Mbs proximal to 
CTGF and 23 Mbs proximal to the human cellular oncogene 
A/yfl (27, 29). 

Amplification and Aberrant Expression of WISPs in Human 
Colon Tumors. Amplification of protooncogenes is seen in 
many human tumors and has etiological and prognostic sig- 
nificance. For example, in a variety of tumor types, c-myc 
amplification has been associated with malignant progression 
and poor prognosis (30). Because WISP-1 resides in the same 
general chromosomal location (8q24) as c-myc, we asked 
whether it was a target of gene amplification, and, if so, 
whether this amplification was independent of the c-myc locus. 
Genomic DNA from human colon cancer cell lines was 
assessed by quantitative PGR and Southern blot analysis. (Fig. 
5 A and B). Both methods detected similar degrees of WISP-1 
amplification. Most cell lines showed significant (2- to 4-fold) 
amplification, with the HT-29 and WiDr cell lines demonstrat- 
ing an 8-fold increase. Significantly, the pattern of amplifica- 
tion observed did not correlate with that observed for c-myc, 
indicating that the c-myc gene is not part of the amplicon that 
involves the WISP-1 locus. 

We next examined whether the WISP genes were amplified 
in a panel of 25 primary human colon adenocarcinomas. The 
relative WISP gene copy number in each colon tumor DNA 
was compared with pooled normal DNA from 10 donors by 
quantitative PCR (Fig. 6). The copy number of WISP-1 and 
WISP-2 was significantly greater than one, approximately 
2-fold for WISP-1 in about 60% of the tumors and 2- to 4-fold 
for WISP-2 in 92% of the tumors (P < 0.001 for each). The 
copy number for WISP-3 was indistinguishable from one (P = 
0.166). In addition, the copy number of WISP-2 was signifi- 
cantly higher than that of WISP-1 {P < 0.001). 

The levels of WISP transcripts in RNA isolated from 19 
adenocarcinomas and their matched normal mucosa were 




Fig. 5. Amplification of WISP-1 genomic DNA in colon cancer cell 
lines. (A) Amplification in cell line DNA was determined by quanti- 
tative PCR. (B) Southern blots containing genomic DNA (10 /ig) 
digested with EcoRl (WISP- 1) or Xbal (c-myc) were hybridized with 
a 100-bp human WISP- 1 probe (amino acids 186-219) or a human 
c-m>c probe (located at bp 1901-2000). The WISP and myc genes* are 
detected in normal human genomic DNA after a longer film exposure. 



Cell Biology, Medical Sciences: Pennica ei aL 



Proc. Nail. Acad. ScL USA 95 (1998) 14721 



I 0 

Z 

£ '* 

o 
o 

C 2 
0) 

o 

> 0 
CO 

oc 



WISP-1 



liggafllliaaiiiiUftft 



WISP-2 




iiliitilililii 



WISP-3 



litiiiiliiiih i.iii 



Tumor Number 

Fig. 6. Genomic amplification of genes in human colon 

tumors. The relative gene copy number of the WISP genes in 25 
adenocarcinomas was assayed by quantitative PCR, by comparing 
DNA from primary human tumors with pooled DNA from 10 healthy 
donors. The data are means ± SEM from one experiment done in 
triplicate. The experiment was repeated at least three times. 

assessed by quantitative PCR (Fig. 7). The level of WISP-1 
RNA present in tumor tissue varied but was significantly 
increased (2- to >25-fold) in 84% (16/19) of the human colon 
tumors examined compared with normal adjacent mucosa. 
Four of 19 tumors showed greater than 10- fold overexpression. 
In contrast, in 79% (15/19) of the tumors examined, WISP-2 
RNA expression was significantly lower in the tumor than the 
mucosa. Similar to WISP-1 , WISPS RNA was overexpressed in 
63% (12/19) of the colon tumors compared with the normal 
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Fig. 7. W75i° RNA expression in primary human colon tumors 
relative to expression in norma! mucosa from the same patient. 
Expression of WISP mRNA in 19 adenocarcinomas was assayed by 
quantitative PCR. The Dukes stage of the tumor is listed under the 
sample number. The data are means ± SEM from one experiment 
done in triplicate. The experiment was repeated at least twice. 



mucosa. The amount of overexpression of WISPS ranged from 
4- to >40-fold. 

DISCUSSION 

One approach to understanding the molecular basis of cancer 
is to identify differences in gene expression between cancer 
cells and normal cells. Strategies based on assumptions that 
steady-state mRNA levels will differ between normal and 
malignant cells have been used to clone differentially ex- 
pressed genes (31). We have used a PCR-based selection 
strategy, SSH, to identify genes selectively expressed in 
C57MG mouse mammary epithelial cells transformed by 
Wnt-l. 

Three of the genes isolated, WISP-1, WISP-2, and WISPS, 
are members of the CCN family of growth factors, which 
includes CTGF, Cyr61, and nov, a family not previously linked 
to Wnt signaling. 

Two independent experimental systems demonstrated that 
WISP induction was associated with the expression of Wnt-l. 
The first was C57MG cells infected with a Wnt-l retroviral 
vector or C57MG cells expressing Wnt-l under the control of 
a tetracyiine-repressible promoter, and the second was in 
Wnt-l transgenic mice, where breast tissue expresses Wnt-l, 
whereas normal breast tissue does not. No WISP RNA expres- 
sion was detected in mammary tumors induced by polyoma 
virus middle T antigen (data not shown). These data suggest 
a link between Wnt- 1 and WISPs in that in these two situations, 
WISP induction was correlated with Wnt-l expression. 

It is not clear whether the WISPs are directly or indirectly 
induced by the downstream components of the Wnt-l signaling 
pathway (i.e., /3-catenin-TCF-l/Lefl). The increased levels of 
WISP RNA were measured in Wnt-l -transformed cells, hours 
or days after Wnt-l transformation. Thus, WISP expression 
could result from Wnt-l signaling directly through /3-catenin 
transcription factor regulation or alternatively through Wnt-l 
signaling turning on a transcription factor, which in turn 
regulates WISPs. 

The WISPs define an additional subfamily of the CCN family 
of growth factors. One striking difference observed in the 
protein sequence of WISP-2 is the absence of a CT domain, 
which is present in CTGF, Cyr61, nov, WISP-1, and WISP-3. 
This domain is thought to be involved in receptor binding and 
dimerization. Growth factors, such as TGF-ft platelet-derived 
growth factor, and nerve growth factor, which contain a cystine 
knot motif exist as dimers (32). It is tempting to speculate that 
WISP-1 and WISP-3 may exist as dimers, whereas WISP-2 
exists as a monomer. If the CT domain is also important for 
receptor binding, WISP-2 may bind its receptor through a 
different region of the molecule than the other CCN family 
members. No specific receptors have been identified for CTGF 
or nov. A recent report has shown that integrin ovft serves as 
an adhesion receptor for Cyr61 (33). 

The strong expression of WISP-1 and WISP-2 in cells lying 
within the fibrovascular tumor stroma in breast tumors from 
Wnt-l transgenic animals is consistent with previous obser- 
vations that transcripts for the related CTGF gene are pri- 
marily expressed in the fibrous stroma of mammary tumors 
(34). Epithelial cells are thought to control the proliferation of 
connective tissue stroma in mammary tumors by a cascade of 
growth factor signals similar to that controlling connective 
. tissue formation during wound repair. It has been proposed 
that mammary tumor cells or inflammatory cells at the tumor 
interstitial interface secrete TGF-j31, which is the stimulus for 
stromal proliferation (34). TGF-/31 is secreted by a large 
percentage of malignant breast tumors and may be one of the 
growth factors that stimulates the production of CTGF and 
WISPs in the stroma. 

It was of interest that WISP-1 and WISP-2 expression was 
observed in the stromal cells that surrounded the tumor cells 
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(epithelial cells) in the Wnt-1 transgenic mouse sections of 
breast tissue. This finding suggests that paracrine signaling 
could occur in which the stromal cells could supply. WISP- 1 and 
WISP-2 to regulate tumor cell growth on the WISP extracel- 
lular matrix. Stromal cell-derived factors in the extracellular 
matrix have been postulated to play a role in tumor cell 
migration and proliferation (35). The localization of WISP-1 
and WISP-2 in the stromal cells of breast tumors supports this 
paracrine model. 

An analysis of WISP-1 gene amplification and expression in 
human colon tumors showed a correlation between DNA 
amplification and overexpression, whereas overexpression of 
WISPS RNA was seen in the absence of DNA amplification. 
In contrast, WISP-2 DNA was amplified in the colon tumors, 
but its mRNA expression was significantly reduced in the 
majority of tumors compared with the expression in normal 
colonic mucosa from the same patient. The gene for human 
WISP'2 was localized to chromosome 20ql2-20ql3, at a region 
frequently amplified and associated with poor prognosis in 
node negative breast cancer and many colon cancers, suggest- 
ing the existence of one or more oncogenes at this locus 
(36-381. Because the center of the 20ql3 amplicon has not yet 
been identified, it is possible that the apparent amplification 
observed for WISP-2 may be caused by another gene in this 
amplicon. 

A recent manuscript on rCop-U the fat orthologue of 
WISP-2, describes the loss of expression of this gene after cell 
transformation, suggesting it may be a negative regulator of 
growth in cell lines (16). Although the mechanism by which . 
WISP-2 RNA expression is down-regulated during malignant 
transformation is unknown, the reduced expression of WISP-2 
in colon tumors and cell lines suggests that it may function as 
a tumor suppressor. These results show that the WISP genes 
are aberrantly expressed in colon cancer and suggest that their 
altered expression may confer selective growth advantage to 
the tumor. 

Members of the Wnt signaling pathway have been impli- 
cated in the pathogenesis of colon cancer, breast cancer, and 
melanoma, including the tumor suppressor gene adenomatous 
polyposis coli and /3-catenin (39). Mutations in specific regions 
of either gene can cause the stabilization and accumulation of 
cytoplasmic /3-catenin, which presumably contributes to hu- 
man carcinogenesis through the activation of target genes such 
as the WISPs. Although the mechanism by which Wnt-1 
transforms cells and induces tumorigenesis is unknown, the 
identification of WISPs as genes that may be regulated down- 
stream of Wnt-1 in C57MG cells suggests they could be 
important mediators of Wnt-1 transformation. The amplifica- 
tion and altered expression patterns of the WISPs in human 
colon tumors may indicate an important role for these genes 
in tumor development , 
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methods. Peptides AENK or AEQK were dissolved in water, made isotonic with 
NaCl and diluted into RPMI growth medium. T-cell-proliferation assays were 
done essentially as described 20,21 . Briefly, after antigen puking (30u,gmr' 
TTCF) with tetrapeptides (l-2mgmr l ), PBMCs or EBV-B cells were 
washed in PBS and fixed for 45 s in 0.05% glu tar aldehyde. Glycine was added 
to a final concentration of 0.1M and the cells were washed five times in RPMI 
1640 medium containing 1% FCS before co-culture with T-cell clones in 
round -bottom 96-well microtitre plates. After 48 h, the cultures were pulsed 
with 1 u.Ci of 3 H -thymidine and harvested for scintillation counting 16 h later. 
Predigestion of native TTCF was done by incubating 200 u.g TTCF with 0.25 u.g 
pig kidney legumain in 500 u.1 50 mM citrate buffer, pH 5.5, for 1 h at 37 °C. 
Glycopeptide digestions. The peptides HID NEED I, HIDN(N-glucosamine) 
EEDI and HIDNESDI. which are based on the TTCF sequence, and 
QQQHLFGSNVTDCSGNFCLFR(KKK), which is based on human transferrin, 
were obtained by custom synthesis. The three C-terminal lysine residues were 
added to the natural sequence to aid solubility. The transferrin glycopeptide 
QQQHLFGSNVTDCSGNFCLFR was prepared by tryp tic (Pro mega) digestion 
of 5mg reduced, carboxy- methylated human transferrin followed by 
concanavalin A chromatography". Glycopeptides corresponding to residues 
622-642 and 421-452 were isolated by reverse-phase HPLC and identified by 
mass spectrometry and N-terminal sequencing. The lyophilized trans ferr in - 
derived peptides were redissolved in 50 mM sodium acetate, pH 5.5, 10 mM 
dithiothreitol, 20% methanol Digestions were performed for 3 h at 30 °C with 
5-50 mU ml" 1 pig kidney legumain or B-cell AEP. Products were analysed by 
HPLC or MALDI-TOF mass spectrometry using a matrix of lOmgrnl"' a- 
cyanocinnamic acid in 50% acetonitrile/0.1% TFA and a PerSeptive Biosystems 
Elite STR mass spectrometer set to linear or reflector mode. Internal standar- 
dization was obtained with a matrix ion of 568.13 mass units. 
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Fas ligand (FasL) is produced by activated T cells and natural 
killer cells and it induces apoptosis (programmed cell death) in 
target cells through the death receptor Fas/Apol/CD95 (ref. 1). 
One important role of FasL and Fas is to mediate immune- 
cytotoxic killing of ceils that are potentially harmful to the 
organism, such as virus-infected or tumour cells 1 . Here we 
report the discovery of a soluble decoy receptor, termed decoy 
receptor 3 (DcR3), that binds to FasL and inhibits FasL-induced 
apoptosis. The DcR3 gene was amplified in about half of 35 
primary lung and colon tumours studied, and DcR3 messenger 
RNA was expressed in malignant tissue. Thus, certain tumours 
may escape FasL-dependent immune-cy to toxic attack by expres- 
sing a decoy receptor that blocks FasL 

By searching expressed sequence tag (EST) databases, we identi- 
fied a set of related ESTs that showed homology to the tumour 
necrosis factor (TNF) receptor (TNFR) gene superfamily 2 . Using 
the overlapping sequence, we isolated a previously unknown full- 
length complementary DNA from human feta! lung. We named the 
protein encoded by this cDNA decoy receptor 3 (DcR3). The cDNA 
encodes a 300-amino-acid polypeptide that resembles members of 
the TNFR family (Fig. la): the amino terminus contains a leader 
sequence, which is followed by four tandem cysteine -rich domains 
(CRDs). Like one other TNFR homologue, osteoprotegerin (OPG) 3 , 
DcR3 lacks an apparent transmembrane sequence, which indicates 
that it may be a secreted, rather than a membrane-asscociated, 
molecule. We expressed a recombinant, histidine-tagged form of 
DcR3 in mammalian cells; DcR3 was secreted into the cell culture 
medium, and migrated on polyacrylamide gels as a protein of 
relative molecular mass 35,000 (data not shown). DcR3 shares 
sequence identity in particular with OPG (31%) and TNFR2 
(29%), and has relatively less homology with Fas (17%). All of 
the cysteines in the four CRDs of DcR3 and OPG are conserved; 
however, the carboxy- terminal portion of DcR3 is 101 residues 
shorter. 

We analysed expression of DcR3 mRNA in human tissues by 
northern blotting (Fig. lb). We detected a predominant 1.2-kilobase 
transcript in fetal lung, brain, and liver, and in adult spleen, colon 
and. lung. In addition, we observed relatively high DcR3 mRNA 
expression in the human colon carcinoma cell line SW480. 

To investigate potential ligand interactions of DcR3, we generated 
a recombinant, Fc-tagged DcR3 protein. We tested binding of 
DcR3-Fc to human 293 cells transfected with individual TNF- 
family ligands, which are expressed as type 2 transmembrane 
proteins (these transmembrane proteins have their N termini in 
the cytosol). DcR3-Fc showed a significant increase in binding to 
cells transfected with FasL 4 (Fig. 2a), but not to cells transfected with 
TNF 5 , Apo2L/TRAIL 6 ' 7 , ApoSL/TWEAK"' 9 , or OPGL/TRANCE/ 
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RANKL 10 ' 12 (data not shown). DcR3-Fc immunoprecipitated shed 
FasL from FasL-transfected 293 cells (Fig. 2b) and purified soluble 
Fast (Fig. 2c), as did the Fc-tagged ectodomain of Fas but not 
TNFR1. Gel-filtration chromatography showed that DcR3-Fc and 
soluble FasL formed a stable complex (Fig. 2d). Equilibrium 
analysis indicated that DcR3-Fc and Fas-Fc bound to soluble 
FasL with a comparable affinity (K A — 0.8 ± 0.2 and 
l.l±0.inM, respectively; Fig. 2e), and that DcR3-Fc could 
block nearly all of the binding of soluble FasL to Fas-Fc (Fig. 2e, 
inset). Thus, DcR3 competes with Fas for binding to FasL. 

To determine whether binding of DcR3 inhibits FasL activity, we 
tested the effect of DcR3-Fc on a po ptosis induction by soluble 
FasL in Jurkat T leukaemia cells, which express Fas (Fig. 3a). DcR3- 
Fc and Fas-Fc blocked soluble- FasL-induced apoptosis in a 
similar dose-dependent manner, with half-maximal inhibition at 
—0.1 fig ml -1 . Time-course analysis showed that the inhibition did 
not merely delay cell death, but rather persisted for at least 24 hours 
(Fig. 3b). We also tested the effect of DcR3-Fc on activation- 
induced cell death (A1CD) of mature T lymphocytes, a FasL- 
dependent process 1 . Consistent with previous results' 3 , activation 
of interleukin-2-stimuiated CD4-positive T cells with anti-CD3 
antibody increased the level of apoptosis twofold, and Fas-Fc 
blocked this effect substantially (Fig. 3c); DcR3-Fc blocked the 



induction of apoptosis to a similar extent. Thus, DcR3 binding 
blocks apoptosis induction by FasL. 

FasL-induced apoptosis is important in elimination of virus- 
infected cells and cancer cells by natural killer cells and cytotoxic T 
lymphocytes; an alternative mechanism involves perforin and 
granzymes 114-16 . Peripheral blood natural killer cells triggered 
marked cell death in Jurkat T leukaemia cells (Fig. 3d); DcR3-Fc 
and Fas-Fc each reduced killing of target cells from —65% to 
—30%, with half-maximal inhibition at — lu.gml" 1 ; the residual 
killing was probably mediated by the perforin/granzyme pathway. 
Thus, DcR3 binding blocks FasL-dependent natural killer cell 
activity. Higher DcR3-Fc and Fas-Fc concentrations were required 
to block natural kiUer cell activity compared with those required to 
block soluble FasL activity, which is consistent with the greater 
potency of membrane-associated FasL compared with soluble 
FasL' 7 . 

Given the role of immune-cytotoxic cells in elimination of 
tumour cells and the fact that DcR3 can act as an inhibitor of 
FasL, we proposed that DcR3 expression might contribute to the 
ability of some tumours to escape immune-cytotoxic attack. As 
genomic amplification frequently contributes to tumorigenesis, we 
investigated whether the DcR3 gene is amplified in cancer. We 
analysed DcR3 gene-copy number by quantitative polymerase chain 
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Figure 1 Primary structure and expression of human DcR3. a, Alignment of the 
amino-acid sequences of 0cR3 and of osteoprotegerin (OPG); the C-terminal 101 
residues of OPG are not shown. The putative signal cleavage site (arrow), the 
cystetne-rich domains (CRD 1 -4), and the AHinked glycosylation site (asterisk) are 
shown, b. Expression of DcR3 mRNA. Northern hybridization analysis was done 
using the DcR3 cDNA as a probe and blots of poly(A)* RNA (Clontech) from 
human fetal and adult tissues or cancer cell lines. PBL. peripheral blood 
lymphocyte. 



Figure 2 Interaction of DcR3 with FasL a, 293 cells were trans fected with pRtC5 
vector (top) or with pRK5 encoding fuil-length FasL (bottom), incubated with 
DcR3-Fc {solid line, shaded area). TNFR1-Fc (dotted line) or buffer control 
(dashed line) (the dashed and dotted lines overlap), and analysed for binding by 
FACS. Statistical analysis showed a significant difference (P < 0.001 ) between the 
binding of DcR3-Fc to cells transfected with FasL or pRK5. PE. phycoerythrin- 
labelled cells, b, 293 cells were transfected as in a and metabolicatly labelled, and 
cell supernatants were immunoprecipitated with Fc-tagged TNFRl, DcR3 or Fas. 
c. Purified soluble FasL (sFasL) was immunoprecipitated with TNFRl -Fc, DcR3- 
Fc or Fas-Fc and visualized by immunoblot with anti-FasL antibody. sFasL was 
loaded directly for comparison in the right-hand lane, d, Flag-tagged sFasL was 
incubated with DcR3-Fc or with buffer and resolved by gel filtration; column 
fractions were analysed in an assay that detects complexes containing DcR3-Fc 
and sFasL-Flag. e, Equilibrium binding of DcR3-Fc or Fas-Fc to sFasL-Flag. 
Inset, competition of DcR3-Fc with Fas-Fc for binding to sFasL-Flag. 
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reaction <PCR) IS in genomic DNA from 35 primary lung and colon 
tumours, relative to pooled genomic DNA from peripheral blood 
leukocytes (PBLs) of 10 healthy donors. Eight of 18 lung tumours 
and 9 of 17 colon tumours showed DcR3 gene amplification, 
ranging from 2- to 18-fold (Fig. 4a, b). To confirm this result, we 
analysed the colon tumour DNAs with three more, independent sets 
of DcR3 -based PCR primers and probes; we observed nearly the 
same amplification (data not shown). 

We then analysed DcR3 mRNA expression in primary tumour 
tissue sections by in situ hybridization. We detected DcR3 expres- 
sion in 6 out of 15 lung tumours, 2 out of 2 colon tumours, 2 out of 5 
breast tumours, and I out of 1 gastric tumour (data not shown). A 
section through a squamous-cell carcinoma of the lung is shown in 
Fig. 4c. DcR3 mRNA was localized to infiltrating malignant epithe- 
lium, but was essentially absent from adjacent stroma, indicating 
tumour-specific expression. Although the individual tumour speci- 
mens that we analysed for mRNA expression and gene amplification 
were different, the in situ hybridization results are consistent with 
the finding that the DcR3 gene is amplified frequently in tumours. 
SW480 colon carcinoma cells, which showed abundant DcR3 
mRNA expression (Fig. lb), also had marked DcR3 gene amplifica- 
tion, as shown by quantitative PCR (fourfold) and by Southern blot 
hybridization (fivefold) (data not shown). 

If DcR3 amplification in cancer is functionally relevant, then 
DcR3 should be amplified more than neighbouring genomic 
regions that are not important for tumour survival To test this, 



we mapped the human DcR3 gene by radiation-hybrid analysis; 
DcR3 showed linkage to marker AFM2 18xe7 (T160), which maps to 
chromosome position 20ql3. Next, we isolated from a bacterial 
artificial chromosome (BAC) library a human genomic clone that 
carries DcR3, and sequenced the ends of the clone's insert. We then 
determined, from the nine colon tumours that showed twofold or 
greater amplification of DcR3, the copy number of the DcR3- 
flanking sequences (reverse and forward) from the BAC, and of 
seven genomic markers that span chromosome 20 (Fig. 4d). The 
DcR3 -linked reverse marker showed an average amplification of 
roughly threefold, slightly less than the approximately fourfold 
amplification of DcR3; the other markers showed little or no 
amplification. These data indicate that DcR3 may be at the 'epi- 
centre* of a distal chromosome 20 region that is amplified in colon 
cancer, consistent with the possibility that DcR3 amplification 
promotes tumour survival. 

Our results show that DcR3 binds specifically to FasL and inhibits 
FasL activity. We did not detect DcR3 binding to several other TNF- 
ligand-family members; however, this does not rule out the possi- 
bility that DcR3 interacts with other Iigands, as do some other 
TNFR family members, including OPG 2 ' 19 . 

FasL is important in regulating the immune response; however, 
little is known about how FasL function is controlled. One mechan- 
ism involves the molecule cFLIP, which modulates apoptosis signal- 
ling downstream of Fas J0 . A second mechanism involves proteolytic 
shedding of FasL from the cell surface 17 . DcR3 competes with Fas for 
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Figure 3 Inhibition of FasL activity by DcR3. a, Human Jurkat T leukaemia cells 
were incubated with Rag-tagged soluble FasL (sFasL;. Sngml -1 ) oligomerized 
with anti-Flag antibody (0.1 jigm! - ') in the presence of the proposed inhibitors 
DcR3-Fc. Fas-Fc or human IgGl and assayed for apoptosis (mean ± s.e.m. of 
triplicates), b. Jurkat cells were incubated with sFasL-Flag plus anti-Flag antibody 
as in a, in presence of 1 ug ml" 1 DcR3-Fc (filled circles). Fas-Fc (open circles) or 
human IgGl (triangles), and apoptosis was determined at the indicated time 
points, c. Peripheral blood T cells were stimulated with PHA and inter! eukin-2, 
followed by control (white bars) or anti-CD3 antibody (filled bars), together with 
phosphate-buffered saline (PBS), human IgGl, Fas-Fc. or DcR3-Fc (I0n.gmr'). 
After 16 h, apoptosis of CQ4* cells was determined (mean ± s.e.m. of results from 
five donors), d. Peripheral blood natural killer cells were incubated with 5, Cr- 
labelled Jurkat cells in the presence of DcR3-Fc (filled circles). Fas-Fc (open 
circles) or human IgGl (triangles), and target-cell death was determined by 
release of 5, Cr (mean ± s.d. for two donors, each in triplicate).- 




Figure 4 Genomic amplification of DcR3 in tumours, a, Lung cancers, comprising 
eight adenocarcinomas (c, d. f. g, h, j, k. r). seven squamous-cell carcinomas (a. e. 
m, n. o. p, q). one non-small-cell carcinoma (b), one small-cell carcinoma (i). and 
one bronchial adenocarcinoma (I). The data are means ± s.d. of 2 experiments 
done in duplicate, b. Colon tumours, comprising 17 adenocarcinomas. Data are 
means ± s.e.m. of five experiments done in duplicate, c. in situ hybridization 
analysis of DcR3 mRNA expression in a squamous-cell carcinoma of the lung. A 
representative bright-held image (left) and the corresponding dark-field image 
(right) show DcR3 mRNA over infiltrating malignant epithelium (arrowheads). 
Adjacent non-malignant stroma (S). blood vessel (V) and necrotic tumour tissue 
(N) are also shown, d. Average amplification of DcR3 compared with amplifica- 
tion of neighbouring genomic regions (reverse and forward. Rev and Fwd). the 
DcR3-linked marker T160, and other chromosome-20 markers, in the nine colon 
tumours showing OcR3 amplification of twofold or more (b). Data are from two 
experiments done in duplicate; Asterisk indicates P < 0.01 for a Student's Hest 
comparing each marker with DcR3. 
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FasL binding; hence, it may represent a third mechanism of 
extracellular regulation of FasL activity. A decoy receptor that 
modulates the function of the cytokine interleukin-1 has been 
described 21 . In addition, two decoy receptors that belong to the 
TNFR family, DcRl and DcR2, regulate the FasL- related apo ptosis - 
inducing molecule Apo2L 22 . Unlike DcRl and DcR2, which are 
membrane-associated proteins, DcR3 is directly secreted into the 
extracellular space. One other secreted TNFR- family member is 
OPG 3 , which shares greater sequence homology with DcR3 (31%) 
than do DcRl (17%) or DcR2 (19%); OPG functions as a third 
decoy for Apo2L". Thus, DcR3 and OPG define a new subset of 
TNFR- family members that function as secreted decoys to mod- 
ulate ligands that induce apoptosis. Pox viruses produce soluble 
TNFR homologues that neutralize specific TNF- family ligands, 
thereby modulating the antiviral immune response 2 . Our results 
indicate that a similar mechanism, namely, production of a soluble 
decoy receptor for FasL, may contribute to immune evasion by 
certain tumours. D 



Methods 

Isolation of DcR3 cDNA. Several overlapping ESTs in GenBank (accession 
numbers AA025672, AA025673 and W67560) and in Lifeseq™ (Incyte 
Pharmaceuticals; accession numbers 1339238, 1533571, 1533650. 1542861, 
1789372 and 2207027) showed similarity to members of the TNFR family. We 
screened human cDNA libraries by PCR with primers based on the region of 
EST consensus; fetal lung was positive for a product of the expected size. By 
hybridization to a PCR-generated probe based on the ESTs, one positive clone 
(DNA30942) was identified. When searching for potential alternatively spliced 
forms of DcRJ that might encode a transmembrane protein, we isolated 50 
more clones; the coding 'regions of these clones were identical in size to that of 
the initial clone (data not shown). 

Fc-fusion proteins (immunoadhesins). The entire DcR3 sequence, or the 
ectodomain of Fas or TNFRI, was fused to the hinge and Fc region of human 
IgGl, expressed in insect SF9 cells or in human 293 cells, and purified as 
described 13 . S, 

Fluorescence-activated cell sorting (FACS) analysis. We transfected 293 
cells using calcium phosphate orEffectene (Qiagen) with pRK5 vector or pRK5 
encoding full-length human FasL 4 (2 u.g), together with pRK5 encoding CrmA 
(2u.g) to prevent cell death. After 16 h, the cells were incubated with 
biotinylated DcR3-Fcor TNFRI -Fc and then with phycoerythr in -conjugated 
streptavidiri (GibcoBRL), and were assayed by FACS. The data were analysed by 
Kolmogorov-Smirnov statistical analysis. There was some detectable staining 
of vector-transfected cells by DcR3-Fc; as these cells express little FasL (data 
not shown), it is possible that DcR3 recognized some other factor that is 
expressed constitutiyely on 293 cells. 

Immunoprecipitation. Human 293 cells were transfected as above, and 
metabolically labelled with ( 35 S]cysteine and [ 3S S|methionine (0.5 mCi; 
Amersham). After 16 h of culture in the presence of z-VAD-fmk (IOjiM), 
the medium was immunoprecipitated with DcR3-Fc, Fas-Fc or TNFRI -Fc 
(5u,g), followed by protein A-Sepharose (Repligen). The precipitates were 
resolved by SDS-PAGE and visualized on a phosphorimager (Fuji BAS2000). 
Alternatively, purified, Flag-tagged soluble FasL (1 u,g) (Alexis) was incubated 
with each Fc-fusion protein ( 1 p.g), precipitated with protein A-Sepharose, 
resolved by SDS-PAGE and visualized by immunoblotting with rabbit anti- 
FasL antibody (Oncogene Research). 

Analysis of complex formation. . Flag-tagged soluble FasL (25 (ig) was 
incubated with buffer or with DcR3-Fc (40 jig) for 1.5 h at 24 °C. The reaction 
was loaded onto a Superdex 200 HR 10/30 column (Pharmacia) and developed 
with PBS; 0.6 -ml fractions were collected. The presence of DcR3-Fc-FasL 
complex in each fraction was analysed by placing 100 p.1 aliquots into microtitre 
wells precoated with anti-human IgG (Boehringer) to capture DcR3-Fc, 
followed by detection with biotinylated anti-Flag antibody Bio M2 (Kodak) and 
streptavidin-horseradish peroxidase (Amersham). Calibration of the column 
indicated an apparent relative molecular mass of the complex of 420 K (data not 
shown), which is consistent with a stoichiometry of two DcR3-Fc homodimers 
to two soluble FasL homotrimers. 

Equilibrium binding analysis. Microtitre wells were coated with anti-human 



IgG. blocked with 2% BSA in PBS. DcR3-Fc or Fas-Fc was added, followed by 
serially diluted Flag- tagged soluble FasL Bound ligand was detected with anti- 
Flag antibody as above. In the competition assay, Fas-Fc was immobilized as 
above, and the wells were blocked with excess IgGl before addition of Flag- 
tagged soluble FasL plus DcR3-Fc. 

T-cell AICD. CD3* lymphocytes were isolated from peripheral blood of 
individual donors using anti-CD3 magnetic beads (Miltenyi Biotech), 
stimulated with phytohaemagglutinin (PHA; 2 u,g mP 1 ) for 24 h, and cultured 
in the presence of interIeukin-2 ( 100 U ml" 1 ) for 5 days. The cells were plated in 
wells coated with anti-CD3 antibody (Pharmingen) and analysed for apoptosis 
16 h later.by FACS analysis of annexin-V-binding of CD4* cells". 
Natural killer cell activity. Natural killer cells were isolated from peripheral 
blood of individual donors using anti-CD56 magnetic beads (Miltenyi 
Biotech), and incubated for 16 h with 5, Cr-loaded Jurkat cells at an effector- 
to -target ratio of 1:1 in the presence of DcR3-Fc, Fas-Fc or human IgGl. 
Target -cell death was determined by release of Sl Cr in effector- target co- 
cultures relative to release of 5, Cr by detergent lysis of equal numbers of Jurkat 
cells. 

Gene-amplification analysis. Surgical specimens were provided by J. Kern 
(lung tumours) and P. Quirke (colon tumours). Genomic DNA was extracted 
(Qiagen) and the concentration was determined using Hoechst dye 33258 
intercalation fluorometry. Amplification was determined by quantitative PCR" 
using a TaqMan instrument (ABI). The method was validated by comparison of 
PCR and Southern hybridization data for the Myc and HER- 2 oncogenes (data 
not shown). Gene-specific primers and fluorogenic probes were designed on 
the basis of the sequence of DcR3 or of nearby regions identified on a BAC 
carrying the human DcR3 gene; alternatively, primers and probes were based 
on Stanford Human Genome Center marker AFM218xe7 (T160), which is 
linked to DcR3 (likelihood score = 5.4), SHGC-36268 (TI59), the nearest 
available marker which maps to —500 kilobases from T160, and five extra 
markers that span chromosome 20. The DcR3-specific primer sequences were 
5'-CTTCTTCGCGCACGCTG-3' and 5'-ATCACGCCGGCACCAG-3' and the 
fluorogenic probe sequence was 5'-{FAM -ACACGATGCGTGCTCCAAGCAG 
AAp-(TAMARA), where FAM is 5' -fluorescein phosphoramidite. Relative 
gene-copy numbers were derived using the formula 2 (acn , where ACT is the 
difference in amplification cycles required to detect DcR3 in peripheral blood 
lymphocyte DNA compared to test DNA. 
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ABC transporters (also known as traffic ATPases) form a large 
family of proteins responsible for the translocation of a variety 
of compounds across membranes of both prokaryotes and 
eukaryotes 1 . The recently completed Escherichia coli genome 
sequence revealed that the largest family of paralogous £ coli 
proteins is composed of ABC transporters 2 . Many eukaryotic 
proteins of medical significance belong to this family, such as 
the cystic fibrosis transmembrane conductance regulator (CFTR), 
the P-glycoprotein (or multidrug- resistance protein) and the 
heterodimeric transporter associated with antigen processing 
(Tapl-Tap2). Here we report the crystal structure at 1.5 A resolu- 
tion of HisP, the ATP-binding subunit of the histidine permease, 
which is an ABC transporter from Salmonella typhimurium. We 
correlate the details of this structure with the biochemical, genetic 
and biophysical properties of the wild-type and several mutant 
HisP proteins. The structure provides a basis for understanding 
properties of ABC transporters and of defective CFTR proteins. 

ABC transporters contain four structural domains: two nucleo- 
tide-binding domains (NBDs), which are highly conserved 
throughout the family, and two transmembrane domains 1 . In 
prokaryotes these domains are often separate subunits which are 
assembled into a membrane-bound complex; in eukaryotes the 
domains are generally fused into a single polypeptide chain. The 
periplasmic histidine permease of 5. typhimurium and £ co/i iJ " 8 is a 
well-characterized ABC transporter that is a good model for this 
superfamily. It consists of a membrane-bound complex, HisQMP 2 , 
which comprises integral membrane subunits, HisQ and HisM, and 
two copies of HisP, the ATP-binding subunit. HisP, which has 
properties intermediate between those of integral and peripheral 
membrane proteins 9 , is accessible from both sides of the membrane, 
presumably by its interaction with HisQ and HisM 6 . The two HisP 
subunits form a dimer, as shown by their cooperativity in ATP 
hydrolysis 5 , the requirement for both subunits to be present for 
activity 8 , and the formation of a HisP dimer upon chemical cross- 
linking. Soluble HisP also forms a dimer 3 . HisP has been purified 
and characterized in an active soluble form 3 which can be recon- 
stituted into a fully active membrane-bound complex*. 

The overall shape of the crystal structure of the HisP monomer is 
that of an T with two thick arms (arm I and arm II); the ATP- 
binding pocket is near the end of arm I (Fig. 1). A six-stranded p- 
sheet ((33 and p8-p 12) spans both arms of the L, with a domain of a 
a- plus p-type structure (pi, p2, p4-p7, al and a2) on one side 
(within arm I) and a domain of mostly ot-helices (ct3r-a9) on the 
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Figure 1 Crystal structure of HisP. a. View of the dimer along an axis 
perpendicular to its two-fold axis. The top and bottom of the dimer are suggested 
to face towards the periplasmic and cytoplasmic sides, respectively (see text). 
The thickness of arm II is about 25 A. comparable to that of membrane. a-Helices 
are shown in orange and p-sheets in green, b. View along the two-fold axis of the 
HisP dimer, showing the relative displacement of the monomers not apparent in 
a. The p-strands at the dimer interface are labelled, c, View of one monomer from 
the bottom of arm I, as shown in a. towards arm II. showing the ATP-binding 
pocket, a-c. The protein and the bound ATP are in 'ribbon' and 'ball-and-stick' 
representations, respectively. Key residues discussed in the text are indicated in 
c. These figures were prepared with MOLSCRIPT 29 . N, amino terminus: C, C 
terminus. 
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Gene amplification is a common event in the progression of 
human cancers, and amplified oncogenes have been shown to 
have diagnostic, prognostic and therapeutic relevance. A 
kinetic quantitative polymerase-cha in -re action (PCR) method, 
based on fluorescent TaqMan methodology and a new instru- 
ment (ABI Prism 7700 Sequence Detection System) capable 
of measuring fluorescence in real-time, was used to quantify 
gene amplification in tumor DNA. Reactions are character- 
ized by the point during cycling when PCR amplification is still 
in the exponential phase, rather than the amount of PCR 
product accumulated after a fixed number of cycles. None of 
the reaction components is limited during the exponential 
phase, meaning that values are highly reproducible in reac- 
tions starting with the same copy number. This greatly 
improves the precision of DNA quantification. Moreover, 
real-time PCR does not require post-PCR sample handling, 
thereby preventing potential PCR-product carry-over con- 
tamination; it possesses a wide dynamic range of quantifica- 
tion and results in much faster and higher sample throughput. 
The real-time PCR method, was used to develop and validate 
a simple and rapid assay for the detection and quantification 
of the 3 most frequently amplified genes (myc, ccndl and 
erbB2) in breast tumors. Extra copies of myc, ccndl and erbB2 
were observed in 10, 23 and 15%, respectively, of 108 breast- 
tumor DNA; the largest observed numbers of gene copies 
were 4.6, 18.6 and 15.1, respectively. These results correlated 
well with those of Southern blotting. The use of this new 
semi-automated technique wilt make molecular analysis of 
human cancers simpler and more reliable, and should find 
broad applications in clinical and research settings. Int. I. 
Cancer 78:661-666, 1998. 
© 1998 miey-Liss, Inc. 

Gene amplification plays an important role in the pathogenesis 
of various solid tumors, including breast cancer, probably because 
over-expression of the amplified target genes confers a selective 
advantage. The first technique used to detect genomic amplification 
was cytogenetic analysis. Amplification of several chromosome 
regions, visualized either as extrachromosomal double minutes 
(dmins) or as integrated homogeneously staining regions (HSRs), 
are among the main visible cytogenetic abnormalities in breast 
tumors. Other techniques such as comparative genomic hybridiza- 
tion (CGH) (Kallioniemi et a!., 1994) have also been used in broad 
searches for regions of increased DNA copy numbers in tumor 
cells, and have revealed some 20 amplified chromosome regions in 
breast tumors. Positional cloning efforts are underway to identify 
the critical gene(s) in each amplified region. To date, genes known 
to be amplified frequently in breast cancers include myc (8q24), 
ccndl (I lql3), and erbB2 (17ql2-q21) (for review, see Bieche and 
Lidereau, 1995). 

Amplification of the myc, ccndl, and erbBl proto-oncogenes 
should have clinical relevance in breast cancer, since independent 
studies have shown that these alterations can be used to identify 
sub-populations with a worse prognosis (Bems et ai, 1992; 
Schuuring et ai, 1992; Stamon et ai, 1987). Muss et at (1994) 
suggested that these gene alterations may also be useful for the 
prediction and assessment of the efficacy of adjuvant chemotherapy 
and hormone therapy. 

However, published results diverge both in terms of the fre- 
quency of these alterations and their clinical value. For instance, 
over 500 studies in 10 years have failed to resolve the controversy 



surrounding the link suggested by Slamon et ai (1987) between 
erbBl amplification and disease progression. These discrepancies 
are partly due to the clinical, histological and ethnic heterogeneity 
of breast cancer, but technical considerations are also probably 
involved. 

Specific genes (DNA) were initially quantified in tumor cells by 
means of blotting procedures such as Southern and slot blotting. 
These batch techniques require large amounts of DNA (5-10 
ug/reaction) to yield reliable quantitative results. Furthermore, 
meticulous care is required at all stages of the procedures to 
generate blots of sufficient quality for reliable dosage analysis. 
Recently, PCR has proven to be a powerful tool for quantitative 
DNA analysis, especially with minimal starting quantities of tumor 
samples (small, early-stage tumors and formalin-fixed, paraffin- 
embedded tissues). 

Quantitative PCR can be performed by evaluating the amount of 
product either after a given number of cycles (end-point quantita- 
tive PCR) or after a varying number of cycles during the 
exponential phase (kinetic quantitative PCR). In the first case, an 
internal standard distinct from the target molecule is required to 
ascertain PCR efficiency. The method is relatively easy but implies 
generating, quantifying and storing an internal standard for each 
gene studied. Nevertheless, it is the most frequently applied 
method to date. 

One of the major advantages of the kinetic method is its rapidity 
in quantifying a new gene, since no internal standard is required (an 
external standard curve- is sufficient). Moreover, the kinetic method 
has a wide dynamic range (at least 5 orders of magnitude), giving 
an accurate value for samples differing in their copy number. 
Unfortunately, the method is cumbersome and has therefore been 
rarely used. It involves aliquot sampling of each assay mix at 
regular intervals and quantifying, for each aliquot, the amplifica- 
tion product. Interest in the kinetic method has been stimulated by a 
novel approach using fluorescent TaqMan methodology and a new 
instrument (ABI Prism 7700 Sequence Detection System) capable 
of measuring fluorescence in real time (Gibson et ai, 1996; Heid et 
ai, 1996). The TaqMan reaction is based on the 5' nuclease assay 
first described by Holland et ai (1991). The latter uses the 5' 
nuclease activity of Taq polymerase to cleave a specific fluorogenic 
oligonucleotide probe during the extension phase of PGR. The 
approach uses dual-labeled fluorogenic hybridization probes (Lee 
et a!.. 1993). One fluorescent dye, co-valently linked to the 5' end 
of the oligonucleotide, serves as a reporter [FAM (i.e., 6-carboxy- 
fluorescein)] and its emission spectrum is quenched by a second 
fluorescent dye, TAMRA (i.e., 6-carboxy-tetramethyl-rhodamine) 
attached to the 3' end. During the extension phase of the PCR 
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cycle, the fluorescent hybridization probe is hydrolyzed by the 
5'-3' nucleolytic activity of DNA polymerase. Nuclease degrada- 
tion of the probe releases the quenching of FAM fluorescence 
emission, resulting in an increase in peak fluorescence emission. 
The fluorescence signal is normalized by dividing the emission 
intensity of the reporter dye (FAM) by the emission intensity of a 
reference dye (i.e., ROX, 6-carboxy-X-rhodamine) included in 
TaqMan buffer, to obtain a ratio defined as the Rn (normalized 
reporter) for a given reaction tube. The use of a sequence detector 
enables the fluorescence spectra of all 96 wells of the thermal 
cycler to be measured continuously during PCR amplification. 

The real-time PCR method offers several advantages over other 
current quantitative PCR methods (Celi et ai, 1994): (i) the 
probe-based homogeneous assay provides a real-time method for 
detecting only specific amplification products, since specific hybri- 
dation of both the primers and the probe is necessary to generate a 
signal; (ii) the Q (threshold cycle) value used for quantification is 
measured when PCR amplification is still in the log phase of PCR 
product accumulation. This is the main reason why C, is a more 
reliable measure of the starting copy number than are end-point 
measurements, in which a slight difference in a limiting component 
can have a drastic effect on the amount of product; (Hi) use of C, 
values gives a wider dynamic range (at least 5 orders of magni- 
tude), reducing the need for serial dilution; (iv) The real-time PCR 
method is run in a closed-tube system and requires no post-PCR 
sample handling, thus avoiding potential contamination; (v) the 
system is highly automated, since the instrument continuously 
measures fluorescence in all 96 wells of the thermal cycler during 
PCR amplification and the corresponding software processes, and 
analyzes the fluorescence data; (vi) the assay is rapid, as results are 
available just one minute after thermal cycling is complete; (vii) the 
sample throughput of the method is high, since 96 reactions can be 
analyzed in 2 hr. 

Here, we applied this semi-automated procedure to determine 
the copy numbers of the 3 most frequently amplified genes in breast 
tumors (myc, ccndl and erfcB2), as well as 2 genes (alb and app) ■ 
located in a chromosome region in which no genetic changes have 
been observed in breast tumors. The results for 108 breast tumors 
were compared with previous Southern -blot data for the same 
samples. 



MATERIAL AND METHODS 
Tumor and blood samples 

Samples were obtained from 1 08 primary breast tumors removed 
surgically from patients at the Centre Rene Huguenin; none of the 
patients had undergone radiotherapy or chemotherapy. Immedi- 
ately after surgery, the tumor samples were placed in liquid 
nitrogen until extraction of high -molecular- weight DNA. Patients 
were included in this study if the tumor sample used for' DNA 
preparation contained more than 60% of tumor cells (histological 
analysis). A blood sample was also taken from 18 of the same 
patients. 

DNA was extracted from tumor tissue and blood leukocytes 
according to standard methods. 

Real-time PCR 

Theoretical basis. Reactions are characterized by the point 
during cycling when amplification of the PCR product is first 
detected, rather than by the amount of PCR product accumulated 
after a fixed number of cycles. The higher the starting copy number 
of the genomic DNA target, the earlier a significant increase in 
fluorescence is observed. The parameter C t (threshold cycle) is 
defined as the fractional cycle number at which the fluorescence 
generated by cleavage of the probe passes a fixed threshold above 
baseline. The target gene copy number in unknown samples is 
quantified by measuring C ( and by using a standard curve to 
determine the starting copy number. The precise amount of 
genomic DNA (based on optical density) and its quality (i.e.; lack 



of extensive degradation) are both difficult to assess. We therefore 
also quantified a control gene (alb) mapping to chromosome region 
4q 1 1 -q 1 3. in which no genetic alterations have been found in 
breast-tumor DNA by means of CGH (Kallioniemi et ai, 1 994). 

Thus, the ratio of the copy number of the target gene to the copy 
number of the alb gene normalizes the amount and quality of 
genomic DNA. The ratio defining the level of amplification is 
termed "N*\ and is determined as follows: 

copy number of target gene (app, myc, ccndl. erbB2) 

N = ■ . 

copy number of reference gene (alb) 

Primers, probes, reference human genomic DNA and PCR 
consumables. Primers and probes were chosen with the assistance 
of the computer programs Oligo 4.0 (National Biosciences, Ply- 
mouth, MN), EuGene (Daniben Systems, Cincinnati, OH) and Primer 
Express (Perkin-Elmer Applied Biosystems, Foster City, CA). 

Primers were purchased from DNAgency (Malvern, PA) and 
probes from Perkin-Elmer Applied Biosystems. 

Nucleotide sequences for the oligonucleotide hybridization 
probes and primers are available on request. 

The TaqMan PCR Core reagent kit, MicroAmp optical tubes, 
and MicroAmp caps were from Perkin-Elmer Applied Biosystems. 

Standard-curve construction. The kinetic method requires a 
standard curve. The latter was constructed with serial dilutions of 
specific PCR products, according to Piatak et ai (1993). In 
practice, each specific PCR product was obtained by amplifying 20 
rig of a standard human genomic DNA (Boehringer, Mannheim, 
Germany) with the same primer pairs as those used later for 
real-time quantitative PCR. The 5 PCR products were purified 
using MicroSpin S-400 HR columns (Pharmacia, Uppsala, Swe- 
den) electrophorezed through an acrylamide gel and stained with 
ethidium bromide to check their quality. The PCR products were 
then quantified spectrophoto metrically and pooled, and serially 
diluted 10-fold in mouse genomic DNA (Clontech, Palo Alto, CA) 
at a constant concentration of 2 ng/pl The standard curve used for 
real-time quantitative PCR was based on serial dilutions of the pool 
of PCR products ranging from 10" 7 (10 5 copies of each gene) to 
I0~ 10 (10 2 copies). This series of diluted PCR products was 
aliquoted and stored at -80°C until use. 

The standard curve was validated by analyzing 2 known 
quantities of calibrator human genomic DNA (20 ng and 50 ng). 

PCR amplification. Amplification mixes (50 pi) contained the 
sample DNA (around 20 ng, around 6600 copies of disomic genes), 
10X TaqMan buffer (5 ul), 200 uM dATP, dCTP, dGTP, and 400 
pM dUTP, 5 mM MgCl 2 , 1 .25 units of AmpliTaq Gold, 0.5 units of 
AmpErase uracil N-glycosylase (UNG), 200 nM each primer and 
100 nM probe. The thermal cycling conditions comprised 2 min at 
50°C and 10 min at 95°C. Thermal cycling consisted of 40 cycles at 
95°C for 15 s and 65°C for 1 min. Each assay included: a standard 
curve (from 10 5 to 10 2 copies) in duplicate, a no-template control, 
20 ng and 50 ng of calibrator human genomic DNA (Boehringer) in 
triplicate, and about 20 ng of unknown genomic DNA in triplicate 
(26 samples can thus be analyzed on a 96-well microplate). All 
samples with a coefficient of variation (CV) higher than 10% were 
retested. 

All reactions were performed in the ABI Prism 7700 Sequence 
Detection System (Perkin-Elmer Applied Biosystems), which 
detects the signal from the fluorogenic probe during PCR. 

Equipment for real-time detection. The 7700 system has a 
built-in thermal cycler and a laser directed via fiber optical cables 
to each of the 96 sample wells. A charge-coupled-device (CDD) 
camera collects the emission from each sample and the data are 
analyzed automatically. The software accompanying the 7700 
system calculates C t and determines the starting copy number in the 
samples. 
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Determination of gene amplification. Gene amplification was 
calculated as described above. Only samples with an N value 
higher than 2 were considered to be amplified. 



RESULTS 

To validate the method, real-time PCR was performed on 
genomic DNA extracted from 108 primary breast tumors, and 18 
normal leukocyte DNA samples from some of the same patients. 
The target genes were the myc, ccndl. and erbB2 proto-oncogenes, 
and the p-amyloid precursor protein gene (app), which maps to a 
chromosome region (21q2l.2) in which no genetic alterations have 
been found in breast tumors (Kallioniemi et a!.. ! 994). The 
reference disomic gene was the albumin gene (alb, chromosome 
4qll-ql3). 



Validation of the standard curve and dynamic range 
of real-time PCR 

The standard curve was constructed from PCR products serially 
diluted in genomic mouse DNA at a constant concentration of 
2 ng/ul. It should be noted that the 5 primer pairs chosen to analyze 
the 5 target genes do not amplify genomic mouse DNA (data not 
shown). Figure 1 shows the real-time PCR standard curve for the 
alb gene. The dynamic range was wide (at least 4 orders of 
magnitude), with samples containing as few as I0 2 copies or as 
many as 10 s copies. 

Copy-number ratio of the 2 reference genes (app and albj . 

The app to alb copy-number ratio was determined in 1 8 normal 
leukocyte DNA samples and all 108 primary breast-tumor DNA 
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Figure 1 - Albumin (alb) gene dosage by real-time PCR. Top: Amplification plots for reactions with starting alb gene copy number ranging 
from 10 s (A9), 10 4 (A7), 10 3 (A4) to 10 2 (A2) and a no-template control (Al). Cycle number is plotted vs. change in normalized reporter signal 
(ARn). For each reaction tube, the fluorescence signal of the reporter dye (FAM) is divided by the fluorescence signal of the passive reference dye 
(ROX), to obtain a ratio defined as the normalized reporter signal (Rn). ARn represents the normalized reporter signal (Rn) minus the baseline 
signal established in the first 15 PCR cycles. ARn increases during PCR as alb PCR product copy number increases until the reaction reaches a 
plateau. C t (threshold cycle) represents the fractional cycle number at which a significant increase in Rn above a baseline signal (horizontal black 
line) can first be detected. Two replicate plots were performed for each standard sample, but the data for only one are shown here. Bottom: 
Standard curve plotting log starting copy number vs. C, (threshold cycle). The black dots represent the data for standard samples plotted in 
duplicate and the red dots the data for unknown genomic DNA samples plotted in triplicate. The standard curve shows 4 orders of linear dynamic 
range. 
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samples. We selected these 2 genes because they are located in 2 
chromosome regions (app. 21q21.2; alb, 4qM-ql3) in which no 
obvious genetic changes (including gains or losses) have been 
observed in breast cancers (Kallioniemi et ai. 1994). The ratio for 
the 18 normal leukocyte DNA samples fell between 0.7 and 1.3 
(mean 1.02 ± 0.21), and was similar for the 108 primary breast- 
tumor DNA samples (0.6 to 1.6, mean 1.06 ± 0.25), confirming 
that alb and app are appropriate reference disomic genes for 
breast-tumor DNA. The low range of the ratios also confirmed that 
the nucleotide sequences chosen for the primers and probes were 
not polymorphic, as mismatches of their primers or probes with the 
subject's DNA would have resulted in differential amplification. 

myc, ccnd 1 and erb£2 gene dose in normal leukocyte DNA 

To determine the cut-off point for gene amplification in breast- 
cancer tissue, 18 normal leukocyte DNA samples were tested for 
the gene dose (N), calculated as described in "Material and 
Methods". The N value of these samples ranged from 0.5 to 1.3 
(mean 0.84 ± 0.22) for myc, 0.7 to 1.6 (mean 1.06 ± 0.23) for 
ccndl and 0.6 to 1.3 (mean 0.91 ±0.19) for erbB2. Since N values 
for myc, ccnd J and erbB2 in normal leukocyte DNA consistently 
fell between 0.5 and 1 .6, values of 2 or more were considered to 
represent gene amplification in tumor DNA. 

myc, ccndl and erbB2 gene dose in breast-tumor DNA 

myc, ccnd J and erbB2 gene copy numbers in the 108 primary 
breast tumors are reported in Table I. Extra copies of ccnd J were 
more frequent (23%, 25/108) than extra copies of erbB2 (15%, 
16/108) and myc (10%, 11/108), and ranged from 2 to 18.6 for 
ccndl, 2 to 15.1 for erbft2, and only 2 to 4.6 for the myc gene. 
Figure 2 and Table II represent tumors in which the ccndl gene was 
amplified 16-fold (T145), 6-fold (TI33) and non-amplified (T118). 
The 3 genes were never found to be co-amplified in the same tumor. 
erbB2 and ccndl were co-amplified in only 3 cases, myc and ccndl 
in 2 cases and myc and erbB2 in ! case. This favors the hypothesis 
that gene amplifications are independent events in breast cancer. 
Interestingly, 5 tumors showed a decrease of at least 50% in the 
erbB2 copy number (N < 0.5), suggesting that they bore deletions 
of the I7q21 region (the site of erbB2). No such decrease in copy 
number was observed with the other 2 proto-oncogenes. 

. Comparison of gene dose determined by real-time quantitative 
PCR and Southern-blot analysis 

Southern-blot analysis of myc, ccndl and erbBl amplifications 
had previously been done on the same 1 08 primary breast tumors. A 
perfect correlation between the results of real-time PCR and 
Southern blot was obtained for tumors with high copy numbers 
(N > 5). However, there were cases (1 myc, 6 ccndl and 4 erbB2) 
in which real-time PCR showed gene amplification whereas 
Southern-blot did not, but these were mainly cases with low extra 
copy numbers (N from 2 to 2.9). 

DISCUSSION 

The clinical applications of gene amplification assays are 
currently limited, but would certainly increase if a simple, standard- 
ized and rapid method were perfected. Gene amplification status 
has been studied mainly by means of Southern blotting, but this 
method is not sensitive enough to detect low-level gene amplifica- 
tion nor accurate enough to quantify the full range of amplification 
values. Southern blotting is also time-consuming, uses radioactive 



TABLE I - DISTRIBUTION OF AMPLIFICATION LEVEL (N) FOR myc. 
ccndl AND crbB2 GENES tN 108 HUMAN BREAST TUMORS 



Gene 




Amplification level (N) 




«U 


0.5-1.9 2-4.9 


2:5 


myc 

ccndl 

erbB2 


0 
0 

5 (4.6%) 


97(89.8%) 11 (10.2%) 
83(76.9%) 17(15.7%) 
87 (80.6%) 8 (7.4%) 


0 

■ 8(7.4%) 
8 (7.4%) 



reagents and requires relatively large amounts of high-quality 
genomic DNA, which means it cannot be used routinely in many 
laboratories. An amplification step is therefore required to deter- 
mine the copy number of a given target gene from minimal 
quantities of tumor DNA (smalt early-stage tumors, cytopuncture 
specimens or formalin-fixed, paraffin-embedded tissues). 

In this study, we validated a PCR method developed for the 
quantification of gene over-representation in tumors. The method, 
based on real-time analysis of PCR amplification, has several 
advantages over other PCR-based quantitative assays such as 
competitive quantitative PCR (Celi et ai, 1 994). First, the real-time 
PCR method is performed in a closed-tube system, avoiding the 
risk of contamination by amplified products. Re-amplification of 
carryover PCR products in subsequent experiments can also be 
prevented by using the enzyme uracil N-giycosylase (UNG) 
(Longo et ai, 1990). The second advantage is the simplicity and 
rapidity of sample analysis, since no post-PCR manipulations are 
required. Our results show that the automated method is reliable. 
We found it possible to determine, in triplicate, the number of 
copies of a target gene in more than 100 rumors per day. Third, the 
system has a linear dynamic range of at least 4 orders of magnitude, 
meaning that samples do not have to contain equal starting amounts 
of DNA. This technique should therefore be suitable for analyzing 
formalin-fixed, paraffin-embedded tissues. Fourth, and above all, 
real-time PCR makes DNA quantification much more precise and 
reproducible, since it is based on Q values rather than end-point 
measurement of the amount of accumulated PCR product. Indeed, 
the ABI Prism 7700 Sequence Detection System enables C t to be 
calculated when PCR amplification is still in the exponential phase 
and when none of the reaction components is rate- limiting. The 
within-run CV of the C t value for calibrator human DNA (5 
replicates) was always below 5%, and the between-assay precision 
in 5 different runs was always below 10% (data not shown). In 
addition, the use of a standard curve is not absolutely necessary, 
since the copy number can be determined simply by comparing the 
Q ratio of the target gene with that of reference genes. The results 
obtained by the 2 methods (with and without a standard curve) are 
similar in our experiments (data not shown). Moreover, unlike 
competitive quantitative PCR, real-time PCR does not require an 
internal control (the design and storage of internal controls and the 
validation of their amplification efficiency is laborious). 

The only potential disavantage of real-time PCR, like all other 
PCR-bascd methods and solid-matrix blotting techniques (South- 
ern blots and dot blots) is that is cannot avoid dilution artifacts 
inherent in the extraction of DNA from tumor cells contained in 
heterogeneous tissue specimens. Only FISH and immunohistochem- 
istry can measure alterations on a cell-by-cell basis (Pauletti et ai, 
1996; Slamon et ai, 1989). However, FISH requires expensive 
equipment and trained personnel and is also time-consuming. 
Moreover, FISH does not assess gene expression and therefore 
cannot detect cases in which the gene product is over-expressed in 
the absence of gene amplification, which will be possible in the 
future by real-time quantitative RT-PCR. Immunohistochemistry is 
subject to considerable variations in the hands of different teams, 
owing to alterations of target proteins during the procedure, the 
different primary antibodies and fixation methods used and the 
criteria used to define positive staining. 

The results of this study are in agreement with those reported in 
the literature. (/) Chromosome regions 4q 1 1 -ql 3 and 21q21.2 
(which bear alb and app, respectively) showed no genetic alter- 
ations in the breast-cancer samples studied here, in keeping with 
the results of CGH (Kallioniemi et ai, 1994). (ii) We found that 
amplifications of these 3 oncogenes were independent events, as 
reported by other teams (Berns et ai, 1 992; Borg et ai, 1 992). (Hi) 
The frequency and degree of myc amplification in our breast tumor 
DNA series were lower than those of ccndl and erbB2 amplifica- 
tion, confirming the findings of Borg et ai ( 1 992) and Courjal et at. 
(1997). (zV) The maxima of ccndl and erbB2 over- re presentation 
were 18-fold and 1 5-fold, also in keeping with earlier results (about 
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Figure 2 - ccndl and alb gene dosage by real-time PCR in 3 breast tumor samples: TU8 (EI 2, C6, black squares), T 1 33 (G I K B4, red squares) 
and T 1 45 (A8, C8, blue squares). Given the C, of each sample, the initial copy number is inferred from the standard curve obtained during the same 
experiment. Triplicate plots were performed for each tumor sample, but the data for only one are shown here. The results are shown in Table II. 



30-fold maximum) (Berns etaL, 1992; Borg et ai, 1992; Courjal et 
ai. 1997). (v) The er6B2 copy numbers obtained with real-time 
PCR were in good agreement with data obtained with other 
quantitative PCR-based assays in terms of the frequency and 
degree of amplification (An et al, 1995; Deng et ai, 1996;Valeron 



et ai, 1996). Our results also correlate well with those recently 
published by Gelmini etaL (1997), who used theTaqMan system to 
measure erbB2 amplification in a small series of breast tumors 
(n = 25), but with an instrument (LS-50B luminescence spectrom- 
eter, Perkin-Elmer Applied Biosystems) which only allows end- 
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TABLE II - EXAMPLES OF ccndl GENE DOSAGE RESULTS 
. FROM 3 BREAST TUMORS' 



Tumor 




ccndl 






alb 




Hccndl/alb 


Copy 
number 


Mean 


SD 


Copy 
number 


Mean 


SD 


T118 


4525 






4223 










4605 


4603 


77 


4365 


4325 


89 


1.06 




4678 






4387 








T133 


59821 






9787 










61659 


61100 


1111 


10092 


10137 


375 


6.03 




61821 






10533 








T145 


128563 






7321 










125892 


125392 


3448 


7762 


7672 


316 


16.34 




121722 






7933 









1 For each sample, 3 replicate experiments were performed and the mean 
and the standard deviation (SD) was determined. The level of ccndl gene 
amplification (Nccndl/alb) is determined by dividing the average ccndl 
copy number value by the average alb copy number value. 



point measurement of fluorescence intensity. Here we report myc 
and ccndl gene dosage in breast cancer by means of quantitative 
PCR. (vi) We found a high degree of concordance between 
real-time quantitative PCR and Southern blot analysis in terms of 
gene amplification, especially for samples with high copy numbers 
(>5-foldj. The slightly higher frequency of gene amplification 
(especially ccndl and erb&2) observed by means of real-time 
quantitative PCR as compared with Southem-blot analysis may be 
explained by the higher sensitivity of the former method. However, 
we cannot rule out the possibility that some tumors with a few extra 



gene copies observed in real-time PCR had additional copies of an 
arm or a whole chromosome {trisomy, tetrasomy or polysomy) 
rather than true gene amplification. These 2 types of genetic 
alteration (polysomy and gene amplification) could be easily 
distinguished in the future by using an additional probe located on 
the same chromosome arm, but some distance from the target gene. 
It is noteworthy that high gene copy numbers have the greatest 
prognostic significance in breast carcinoma (Borg et ai. 1992; 
Slamon era/., 1987). 

Finally, this technique can be applied to the detection of gene 
deletion as well as gene amplification. Indeed, we found a 
decreased copy number of erbBl (but not of the other 2 proto- 
oncogenes) in several tumors; erbBl is located in a chromosome 
region (17q21) reported to contain both deletions and amplifica- 
tions in breast cancer (Bieche and Lidereau, 1995). 

In conclusion, gene amplification in various cancers can be used 
as a marker of pre-neoplasia, also for early diagnosis of cancer, 
staging, prognostication and choice of treatment. Southern blotting 
is not sufficiently sensitive, and FISH is lengthy and complex. 
Real-time quantitative PCR overcomes both these limitations, and 
is a sensitive and accurate method of analyzing large numbers of 
samples in a short time. It should find a place in routine clinical 
gene dosage. 
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true and that all statements made on information or belief are believed to be true, 
and further that these statements Were made with the knowledge that willful false 
statements and the like so made are punishable by fine or imprisonment, or both, 
under Section 1001 of Title 18 of the United States Code and that such willful 
statements may jeopardize the validity of the application or any patent issued 
thereon. : : *~' 
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Genome-wide Study of Gene Copy Numbers, 
Transcripts, and Protein Levels in Pairs of 
Non-invasive and Invasive Human Transitional 
Cell Carcinomas* 

Torben F. 0rntoft*§, Thomas ThykjaerU, Frederic M. Waldman||, Hans Wolf**, 
and Julio E. Cells** 



Gain and loss of chromosomal material is characteristic 
of bladder cancer, as well as malignant transformation in 
general. The consequences of these changes at both the 
transcription and translation levels is at present unknown 
partly because of technical limitations. Here we have at- 
tempted to address this question in pairs of non-invasive 
and invasive human bladder tumors using a combination 
of technology that included comparative genomic hybrid- 
ization, high density oligonucleotide array-based monitor- 
ing of transcript levels (5600 genes), and high resolution 



phenomenon at both the transcription and translation levels. 
High throughput array studies of the breast cancer cell line 
BT474 has suggested that there Is a correlation between 
DNA copy numbers and gene expression in highly amplified 
areas (2), and studies of individual genes in solid tumors 
have revealed a good correlation between gene dose and 
mRNA or protein levels in the case of c-erb-B2, cyelin d1 t 
ems1 t and N-myc (3-5). However, a high cyclin D1 protein 
expression has been observed without simultaneous am- 



two-dlmenslonal gel electrophoresis^ results showed^^ation (4), and a low level of c-myc copy number in- 
that there ■* » dosaae effect that in some casesT crease was observed without concomitant c-myc protein 



is a gene dosage effect tfiat in some cases 
superimposes on other regulatory mechanisms. This ef- 
fect depended (p < 0.015) on the magnitude of the com- 
parative genomic hybridization change. In general (18 of 
23 cases), chromosomal areas with more than 2-fold gain 
of DNA showed a corresponding increase in mRNA tran- 
scripts. Areas with loss of DNA, on the other hand, 
showed either reduced or unaltered transcript levels^ Be- 
cause most proteins resolved by two-dimensional gels 
are unknown it was only possible to compare mRNA and 
protein alterations in relatively few cases of wefl focused 
abundant proteins. With few exceptions we found a good 
correlation (p < 0.005) between transcript alterations and 
protein levels. The Implications, as well as limitations, 
of the approach are discussed. Molecular & Cellular 
Proteomics 1:37-45, 2002. 



Aneuploidy is a common feature of most human cancers 
(1). but little is known about the genome- wide effect of this 
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overexpresslon (6). 

In human bladder tumors, karyotyping, fluorescent in situ 
hybridization, and comparative genomic hybridization (CGH) 1 
have revealed chromosomal aberrations that seem to be 
characteristic of certain stages of disease progression. In the 
case of non-invasive pTa transitional cell carcinomas (TCCs), 
this includes loss of chromosome 9 or parts of it as well as 
loss of Y in males. In minimally invasive pT1 TCCs, the fol- 
lowing alterations have been reported: 2q-, 11p-. 1q+, 
1lq13+, 17q+, and 20q+ (7-12). It has been suggested that 
these regions harbor tumor suppressor genes and onco- 
genes; however, the large chromosomal areas Involved often 
contain many genes, making meaningful predictions of the 
functional consequences of losses and gains very difficult 

In this investigation we have combined genome-wide tech- 
nology for detecting genomic gains and losses (CGH) with 
gene expression profiling techniques (microarrays and pro- 
teomics) to determine the effect of gene copy number on 
transcript and protein levels in pairs of non-Invasive and in- 
vasive human bladder TCCs. 

EXPERIMENTAL PROCEDURES 

Material— Bladder tumor biopsies were sampled after informed 
consent was obtained and after removal of tissue for routine pathol- 
ogy examination. By light microscopy tumors 335 and 532 were 
staged by an experienced pathologist as pTa (superficial papillary), 

1 The abbreviations used are: CGH, comparative genomic hybrid- 
ization; TCC, transitional cell carcinoma; LOH, toss of heterozygosity: 
PA-FABP, psoriasis-associated fatty acid-binding protein; 2D. 
two-dimensional. 
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Fig. 1 . DNA copy number and mRNA expression tevei Shown from left to right are chromosome (C/ir.), CGH profiles, gene location and 
expression level of specific genes, and overall expression level along the chromosome. A, expression of mRNA in invasive tumor 733 as 
compared with the non-invasive counterpart tumor 335. 8, expression of mRNA in invasive tumor 827 compared with the non-invasive 
counterpart tumor $32. The average fluorescent signal ratio between tumor DNA and normal DNA is shown along the length of the chromosome 
(tefl). The bold curve in the ratio profile represents a mean of four chromosomes and is surrounded by thin curves indicating one standard 
deviation. The central vertical tine Qjroken) indicates a ratio value of 1 (no change), and the vertical lines next to it (dotteo) Indicate a ratio of 
0.5 (feft) and 2.0 (r/gnf). In chromosomes where the non-invasive tumor 335 used for comparison showed alterations in DNA content, the ratio 
profile of that chromosome is shown to the right of the invasive tumor profile. The colored bars represents one gene each, Identified by the 
running numbers above the bars (the name of the gene can be seen at www.MDLDK/sdata.html). The bars indicate the purported location of 
the gene r and the colors indicate the expression level of the gene in the invasive tumor compared with the non-invasive counterpart; >2-fold 
Increase (WacA), >2*fold decrease {blue), no significant change (orange). The bar to the far right, entitled Expression shows the resulting change 
in expression along the chromosome; the colors indicate that at least half of the genes were up-regulated (WacA). at least half of the genes 
down-regulated (b/t/e), or more than half of the genes are unchanged (orange). If a gene was absent in one of the samples and present in 
another, it was regarded as more than a 2-fold change. A 2-fold level was chosen as this corresponded to one standard deviation in a double 
determination of -1800 genes. Centromeres and heterochromatic regions were excluded from data analysis. 



grade I and II, respectively, tumors 733 and 827 were staged as pT1 
(invasive into submucosa), 733 was staged as solid, and 827 was 
staged as papillary, both grade III. 

mRNA Preparation- Tissue biopsies, obtained fresh from surgery, 
were embedded immediately in a sodium-guanldinium thiocyanate 
solution and stored at -80 °C. Total RNA was isolated using the 
RNAzol 8 RNA isolation method (WAK-Chemie Medical GMBH). 
poWAT RNA was isolated by an oligo(dT) selection step (Ollgotex 
mRNA kit; Qiagen). 

cRNA Preparation- 4 * ftg of mRNA was used as starting material. 
The first and second strand cDNA synthesis was performed using the 
Superscripts choice system (Invitrogen) according to the manufac- 
turer's instructions but using an oligo(dT) primer containing a T7 RNA 
polymerase binding site. Labeled cRNA was prepared using the ME- 
GAscrip® in vitro transcription kit (Ambion). Biotin-labeted CTP and 



UTP (Enzo) was used, together with unlabeled NTPs in the reaction. 
Following the in vitro transcription reaction, the unincorporated nu- 
cleotides were removed using RNeasy columns (Qiagen). 

Array Hybridization and Scanning — Array hybridization and scan- 
ning was modified from a previous method (13). 10 fig of cRNA was 
fragmented at 94 °C for 35 min in buffer containing 40 mM Tris 
acetate, pH 8.1, 100 mM KOAc, 30 mM MgOAc. Prior to hybridization, 
the fragmented cRNA in a 6x SSPE-T hybridization buffer (1 m Nad, 
10 mM Tris, pH 7.6, 0.00596 Triton), was heated to 95 °C for 5 min, 
subsequently cooled to 40 °C, and loaded onto the Affymetrix probe 
array cartridge. The probe array was then incubated for 16 h at 40 °C 
at constant rotation (60 rpm). The probe array was exposed to 10 
washes in 6x SSPE-T at 25 °C followed by 4 washes in 0.5x SSPE-T 
at 50 *C. The biotlnylated cRNA was stained with a streptavidtn- 
phycoerythrin conjugate, 10 tig/m\ (Molecular Probes) in 6x SSPE-T 



38 Molecular & Cellular Proteomics 1. 1 



Gene Copy Numbers, Transcripts, and Protein Levels 




Fig. 1— continued 



for 30 min at 25 *C followed by 10 washes In 6x SSPE-T at 25 9 C. The 
probe arrays were scanned at 560 nm using a confocal laser scanning 
microscope (made for Affymetrix by Hewlett-Packard). The readings 
from the quantitative scanning were analyzed by Affymetrix gene 
expression analysis software. 

MicrosatelGte Analysis ~ Mlcrosatell'rte Analysis was performed as 
described previously (14). Microsatellitea were selected by use of 
www.ncbl.nlm.nih.gov/genemap98, and primer sequences were ob- 
tained from the genome data base at www.gdb.org. DNA was extracted 
from tumor and blood and amplified by PCR in a volume of 20 /J for 35 
cycles. The amplicons were denatured and etectrophoresed for 3 h in an 
ABI Prism 377. Data were collected in the Gene Scan program for 
fragment analysis. Loss of heterozygosity was defined as less than 33% 
of one allele detected in tumor amplicons compared with Wood. 

Proteomic Analysis— TCCs were minced into small pieces and 
homogenized In a small glass homogenizer in 0.5 ml of lysis solution. 
Samples were stored at -20 °C until use. The procedure for 2D gel 
electrophoresis has been described in detail elsewhere (15, 16). Gels 
were stained with silver nitrate and/or Coomassie Brilliant Blue. Pro- 
teins were identified by a combination of procedures that included 
microsequencing. mass spectrometry, two-dimensional gel Western 
immunoWotting. and comparison with the master twc^imenstonal gel 
image of human keratinocyte proteins; see biobase.dk/cgi-bin/cens. 

CGH— Hybridization of differentially labeled tumor and normal DNA 
to normal metaphase chromosomes was performed as described 
previously (10). Fluorescein-labeled tumor DNA (200 ng), Texas Red- 



labeled reference DNA (200 ng), and human Cot-1 DNA (20 /ig) were 
denatured at 37 °C for 5 min and applied to denatured normal met- 
aphase slides. Hybridization was at 37 °C for 2 days. After washing, 
the slides were counters tained with 0.15 ^g/ml 4,6-diamidino-2-phe- 
nyiindole in an anti-fade solution. A second hybridization was per- 
formed for an tumor samples using fluoresceln-labeled reference DNA 
and Texas Red-labeled tumor DNA (inverse labeling) to confirm the 
aberrations detected during the initial hybridization. Each CGH ex- 
periment also included a normal control hybridization using fluores- 
cein- and Texas Red-labeled normal DNA. Digital image analysis was 
used to identify chromosomal regions with abnormal fluorescence 
ratios, indicating regions of DNA gains and losses. The average 
green:red fluorescence intensity ratio profiles were calculated using 
four images of each crtromosome (eight chromosomes total) with 
normalization of the green:red fluorescence intensity ratio for the 
entire metaphase and background correction. Chromosome identifi- 
cation was performed based on 4 t 6-diamidino-2-phenylindole band- 
ing patterns. Only images showing uniform high intensity fluores- 
cence with minimal background staining were analyzed. All 
centromeres, p arms of acrocentric chromosomes, and heterochro- 
matic regions were excluded from the analysis. 

RESULTS 

Comparative Genomic Hybridization—The CGH analysis 
identified a number of chromosomal gains and losses in the 
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Table 1 

Correlation between alterations detected by CGH and by expression monitoring 

Top, CGH used as Independent variable (if CGH alteration - what expression ratio was found); bottom, altered expression used as 
independent variable (if expression alteration - what CGH deviation was found). 



CGH alterations 



Tumor 733 vs. 335 
Expression change clusters 



Concordance 



CGH alterations 



Tumor 82^ vs. 532 
Expression change clusters 



Concordance 



1 3 Gain 1 0 Up-regutation 

0 Down-regulation 

3 No change 
10 Loss 1 Up-regulation 

5 Down-regulation 

4 No change 



77% 



50% 



Expression change clusters 



Tumor 733 vs. 335 
CGH alterations 



Concordance Expression change clusters 



10 Gain 8 Up-regulation 

0 Down-regulation 

2 No change 
12 Loss 3 Up-regutation 

2 Down regulation 

7 Hp change 

Tumor a27 vs. 532 



CGH alterations 



80% 
17% 

Concordance 



16 Up-regulation 


11 Gain 


69% 


17 Up-regulation 


10 Gain 


59% 


2 Loss 

3 No change 






5 Loss 

2 No change 




21 Down-regulation 


1 Gain 


38% 


9 Down-regulation 


0 Gain 


33% 


8 Loss 

12 No change 






3 Loss 

6 No change 




15 No change 


3 Gain 


60% 


21 No change 


1 Gain 


81% 


3 Loss 

9 No change 




1 — 


3 Loss 

17 No change 





two Invasive tumors (stage pT1 , TCCs 733 and 827), whereas 
the two non-invasive papillomas (stage pTa, TCCs 335 and 
532) showed only 9p-, 9q22-q33-, and and 7+ f 9q-, 
and Y-, respectively. Both invasive tumors showed changes 
(1q22-24+, 2q14.1-qter-, 3q12-q13.3-, 6q12-q22-, 
9q34+, 11q12-q13+, v 17+, and 20q11.2-q12+) that are typ- 
ical for their disease stage, as well as additional alterations, 
some of which are shown in Rg. 1. Areas with gains and 
losses deviated from the normal copy number to some extent, 
and the average numerical deviation from normal was 0.4-fold 
in the case of TCC 733 and 0.3-fold for.TCC 827. The largest 
changes, amounting to at least a doubling of chromosomal 
content, were observed at 1q23 in TCC 733 (Fig. 1A) and 
20q12 In TCC 827 (Fig. 1B). 

mRNA Expression in Relation to DNA Copy Number—The 
mRNA levels from the two invasive tumors (TCCs 827 and 
733) were compared with the two non-invasive counterparts 
(TCCs 532 and 335). This was done in two separate experi- 
ments in which we compared TCCs 733 to 335 and 827 to 
532, respectively, using two different scaling settings for the 
arrays to rule out scaling as a confounding parameter. Ap- 
proximately 1,800 genes that yielded a signal on the arrays 
were searched in the Unigene and Genemap data bases for 
chromosomal location, and those with a known location 
(1096) were plotted as bars covering their purported locus. In 
that way it was possible to construct a graphic presentation of 
DNA copy number and relative mRNA levels along the indi- 
vidual chromosomes (Fig. 1). 

For each mRNA a ratio was calculated between the level in 
the invasive versus the non-invasive counterpart. Bars, which 
represent chromosomal location of a gene, were color-coded 
according to the expression ratio, and only differences larger 



than 2-fold were regarded as informative (Rg. 1). The density 
of genes along the chromosomes varied, and areas contain- 
ing only one gene were excluded from the calculations. The 
resolution of the CGH method is very low, and some of the 
outlier data may be because of the fact that the boundaries of 
the chromosomal aberrations are not known at high resolution. 

Two sets of calculations were made from the data. For the 
first set we used CGH alterations as the Independent variable 
and estimated the frequency of expression alterations in these 
chromosomal areas. In general, areas with a strong gain of 
chromosomal material contained a cluster of genes having 
increased mRNA expression. For example, both chromo- 
somes 1q21-q25, 2p and 9q, showed a relative gain of more 
than 100% in DNA copy number that was accompanied by 
increased mRNA expression levels in the two tumor pairs (Rg. 
1). In most cases, chromosomal gains detected by CGH were 
accompanied by an increased level of transcripts in both 
TCCs 733 (77%) and 827 (80%) (Table I, top). Chromosomal 
losses, on the other hand, were not accompanied by de- 
creased expression in several cases, and were often regis- 
tered as having unaltered RNA levels (T able I, top). The inabil- 
ity to detect RNA expression changes in these cases was not 
because of fewer genes mapping to the lost regions (data not 
shown). 

In the second set of calculations we selected expression 
alterations above 2-fold as the Independent variable and es- 
timated the frequency of CGH alterations in these areas. As 
above, we found that increased transcript expression corre- 
lated with gain of chromosomal material (TCC 733, 69% and 
TCC 827, 59%), whereas reduced expression was often de- 
tected In areas with unaltered CGH ratios (Table I, bottom). 
Furthermore, as a control we looked at areas with no after- 
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rq 2 Correlation between maximum CGH aberration and the ability to detect expression change by oligonucleotide array 
monitoring. The aberration is shown as a numerical -fbU change in ratio between invasive tumors 827 (▲) and 733 (♦) and their non-invasive 
counterparts 532 and 335. The expression change was taken from the Expression line to the right in Fig. 1, which depicts the resulting 
expression change for a given crirornosomaJ region. At least half of the mRNAs from a given region have to be either up- or down-regulated 
to be scored as an expression change. All chromosomal arms In which the CGH ratio plus or minus one standard deviation was outside the 
ratio value of one were included. 



ation in expression. No alteration was detected by CGH in 
most of these areas (TCC 733, 60% and TCC 827, 81 %; see 
Table 1, bottom). Because the ability to observe reduced or 
Increased mRNA expression clustering to a certain chromo- 
somal area clearly reflected the extent of copy number 
changes, we plotted the maximum CGH aberrations fn the 
regions showing CGH changes against the ability to detect a 
change in mRNA expression as monitored by the oligonucleo- 
tide arrays (Fig. 2)CEor both tumors TCC 733 <p < 0.01 5) and 
TCC 827 ip < 0.00003) a highly significant correlation was 
observed between the level of CGH ratio change (reflecting 
the DNA copy number) and alterations detected by the array 
based technology (Fig. 2£ Similar data were obtained when 
areas with altered expression were used as independent vari- 
ables. These areas correlated best with CGH when the CGH 
ratio deviated 1.6- to 2.0-fold (Table |, bottom) but mostly did 
not at lower CGH deviations. These data probably reflect that 
loss of an allele may only lead to a 50% reduction In expres- 
sion level, which is at the cut-off point for detection of expres- 
sion alterations. Gain of chromosomal material can occur to a 
much larger extent. 

Microsatelllte-based Detection of Minor Areas of Loss- 
es—In TCC 733, several chromosomal areas exhibiting DNA 
amplification were preceded or followed by areas with a nor- 
mal CGH but reduced mRNA expression (see Fig. 1, TCC 733 
chromosome 1q32, 2p21, and 7q21 and q32, 9q34, and 
1Qq22). To determine whether these results were because of 
undetected loss of chromosomal material in these regions or 



because of other non-structural mechanisms regulating tran- 
scription, we examined two mlcrosatellites positioned at chro- 
mosome 1q25-32 and two at chromosome 2p22. Loss of 
heterozygosity (LOH) was found at both 1q25 and at 2p22 
indicating that minor deleted areas were not detected with the 
resolution of CGH (Fig. 3). Additionally* chromosome 2p in 
TCC 733 showed a CGH pattern of gain/no change/gain of 
DNA that correlated with transcript increase/decrease/in- 
crease. Thus, for the areas showing increased expression 
there was a correlation with the DNA copy number alterations 
(Fig. 1A). As indicated above, the mRNA decrease observed in 
the middle of the chromosomal gain was because of LOH, 
implying that one of the mechanisms for mRNA down-regu- 
lation may be regions that have undergone smaller losses of 
chromosomal material. However, this cannot be detected with 
the resolution of the CGH method. 

In both TCC 733 and TCC 827, the telomeric end of chro- 
mosome 11 p showed a normal ratio in the CGH analysis; 
however, clusters of five and three genes, respectively, lost 
their expression. Two mlcrosatellites (D11S1760, D11S922) 
positioned close to MUC2, IGF2, and cathepsln D indicated 
LOH as the most likely mechanism behind the loss of expres- 
sion (data not shown). 

A reduced expression of mRNA observed in TCC 733 at 
chromosomes 3q24, 11p11, 12p12.2, 12q21.1, and 16q24 
and in TCC 827 at chromosome 11p15.5 t 12p11, 15q11.2, 
and 18q12 was also examined for chromosomal losses using 
microsatellites positioned as close as possible to the gene loci 
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Fig. 3. MlcrosateJIhe analysis of lose of heterozygosity. Tumor 
733 showing loss of heterozygosity at chromosome 1q25, detected 
(a) by D1S215 close to Hu class I histocompatibility antigen (gene 
number 36 in Fig. 1), (b) by 01S2735 close to cathepsfn E (gene 
number 41 in Fig. 1), and (c) at chromosome 2p23 by D2S2251 close 
to general ^-spectrin (gene number 1 1 on Fig. 1) and of (o) tumor 827 
showing loss of heterozygosity at chromosome 18q12 by S18S1 118 
close to mitochondrial 3-oxoacyl-coenzyme A thiofase (gene number 
12 in Fig. 1). The upper curves show the electropherogram obtained 
from normal DNA from leukocytes (/v), and the tower curves show the 
electropherogram from tumor DNA (7). In ail cases one allele is 
partially lost in the tumor amplteon. 

showing reduced mRNA transcripts. Only the microsatellite 
positioned at 18q12 showed LOH (Fig. 3), suggesting that 
transcriptional down-regulation of genes fn the other regions 
may be controlled by other mechanisms. 

Relation between Changes fn mRNA and Protein Levels- 
2D-PAGE analysis, in combination with Coomassie Brilliant 
Blue and/or silver staining, was carried out on all four tumors 
using fresh biopsy material. 40 well resolved abundant known 
proteins migrating in areas away from the edges of the pH 
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Fia 4. Correlation between protein levels as Judged by 20- 
PAGE and transcript ratio. For comparison proteins were divided In 
three groups, unaltered in level or up- or down-regulated ^horizontal 
axis). The mRNA ratio as determined by oligonucleotide arrays was 
plotted for each gene {vertical axis), a, mRNAs that were scored as 
present In both tumors used for the ratio calculation; A, mRNAs that 
were scored as abkerrt In the Invasive tumors (along horizontal axis) or 
as absent in noninvasive reference (top of figure). Two different 
scalings were used to exclude scaling as a confounder, TCCs 827 
and 532 (AA) were scaled with background suppression, and TCCs 
733 and 335 (#p) were scaled without suppression. Both compari- 
sons showed highly significant (p < 0.005) differences in mRNA ratios 
between the groups. Proteins shown were as follows: Group A (from 
/efl), phosphoglucomutase 1, glutathione transferase class /* number 
4, fatty add-bindlng protein homoiogue, cytokeratin 15, and cyto- 
keratin 13; 6 (from to/if), fatty acid-binding protein homologue P 28-kOa 
heat shock protein, cytokeratin 13, and calcyciin; C<from Jett, o-eno- 
lase, hnRNP B1, 28-kDa heat shock protein, 14-3-3-e, and 
pre-mRNA splicing factor; 0, mesothelial keratin K7 (type II); £ (from 
top) t glutathione S-transferase-ir and mesotheOal keratin K7 (type II); 
F(from top and te/r), adenylyl cyclase-asscciated protein, E-cadherin, 
keratin 19, calgfezarin, phosphoglycerate mutase, annexin'lV, cy- 
toskeletaJ yactin. hnRNP A1. integral membrane protein cainextn 
(IP90), hnRNP H, brain-type clathrin light chain-a, hnRNP F, 70-kDa 
heat shock protein, heterogeneous nuclear ribonucleoprotein A/B, 
translationally controlled tumor protein, liver glyceraJdehyde-3-phos^ 
phate dehydrogenase, keratin 8, aldehyde reductase, and Na,K- 
ATPase 0-1 subunit; G, (from top and left,. TCP20, calgizzarin, 70- 
kDa heat shock protein, calnexin, hnRNP H, cytokeratin 15, ATP 
synthase, keratin 19, triosephosphate Isomerase, hnRNP F, liver gfyc- 
eraldehyde-3-phosphatase dehydrogenase, glutathione S-transfer- 
ase-w, and keratin 8; H (from teft), plasma gelsolin, autoantigen cal- 
reticulin, thioredoxin, and NAD+-dependent 15 hyefroxyprostaglandin 
dehydrogenase; / (from top), prolyl 4-hydroxytase 0-subunit, cyto- 
keratin 20, cytokeratin 17, prohibition, and fructose 1,6-biphos- 
phatase; J annexln II; K, annexin IV; L (from top and tefl, 90-kDa heat 
shock protein, prolyl 4-hydroxylase 0-subunit, o-enolase, GRP 78 
cyclophiiia and cofilin. 

gradient, and having a known chromosomal location, were 
selected for analysis in the TCC pair 827/532. Proteins were 
identified by a combination of methods (see "Experimental 
Procedures"). In general there was a highly significant corre- 
lation (p < 0.005) between mRNA and protein alterations (Fig. 
4). Only one gene showed disagreement between transcript 
alteration and protein alteration. Except for a group of cyto- 
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Fig. 5. Comparison of protein and transcript levels In Invasive 
and non-invasive TCCs. The upper part of the figure shows a 2D gel 
(toft) and the oligonucleotide array (fight) of TCC 532. The red rectan- 
gles on the upper gel highlight the areas that are compared below. 
Identical areas of 2D gels of TCCs 532 and 827 are shown below. 
Clearly, cytokeratins 13 and 15 are strongly down-regulated In TCC 
827 (red annotation). The tile on the array containing probes for 
cytokeratin 15 is enlarged below the array (red arrow) from TCC 532 
and Is compared with TCC 827. The upper row of squares In each tile 
corresponds to perfect match probes; the tower row corresponds to 
mismatch probes containing a mutation (used for correction for un- 
speciflc binding). Absence of signal is depicted as black, and the 
higher the signal the lighter the color. A high transcript level was 
detected in TCC 532 (6151 units) whereas a much lower level was 
detected in TCC 827 (absence of signals). For cytokeratin 13, a high 
transcript level was also present In TCC 532 (15659 units), and a 
much lower level was present in TCC 827 (623 units). The 2D gets at 
the bottom of the figure {left) show levels of PA-FABP and adipocyte- * 
FABP In TCCs 335 and 733 (invasive), respectively- Both proteins are 
down-regulated in the invasive tumor. To the right we show the array 
tiles for the PA-FABP transcript. A medium transcript level was de- 
tected in the case of TCC 335 (1277 units) whereas very low levels 
were detected In TCC 733 (166 units). iEF, Isoelectric focusing. 



keratins encoded by genes on chromosome 17 (Fig. 5) the 
analyzed proteins did not belong to a particular family. 26 well 
focused proteins whose genes had a know chromosomal 
location were detected in TCCs 733 and 335, and of these 19 
correlated (p < 0.005) with the mRIMA changes detected using 
the arrays. (Fig. 4). For example, PA-FABP was highly ex- 
pressed in the non-invasive TCC 335 but lost fn the Invasive 
counterpart (TCC 733; see Fig. 5). The smaller number of 
proteins detected in both 733 and 335 was because of the 
smaller size of the biopsies that were available. 

1 1 chromosomal regions where CGH showed aberrations 
that corresponded to the changes In transcript levels also 
showed corresponding changes in the protein level (Table II). 
These regions included genes that encode proteins that are 
found to be frequently altered in bladder cancer, namely 
cytokeratins 17 and 20, annexing II and IV, and the fatty 
acid-binding proteins PA-FABP and FBP1. Four of these pro- 
teins were encoded by genes in chromosome 17q, a fre- 
quently amplified chromosomal area in invasive bladder 
cancers. 

DISCUSSION 

Most human cancers have abnormal DMA content, having 
tost some chromosomal parts and gained others. The present 
study provides some evidence as to the effect of these gains 
and losses on gene expression in two pairs of non-Invasive 
and invasive TCCs using high throughput expression arrays 
and proteomics, in combination with CGH. In general, the 
results showed that there is a clear Individual regulation of the 
mRNA expression of single genes, which in some cases was 
superimposed by a DNA copy number effect In most cases, 
genes located in chromosomal areas with gains often exhib- 
ited increased mRNA expression, whereas areas showing 
losses showed' either no change or a reduced mRNA expres- 
sion. The latter might be because of the fact that losses most 
often are restricted to loss of one allele, and the cut-off point 
for detection of expression alterations was a 2-fold change, 
thus being at the border of detection. In several cases, how- 



Table II 



Proteins whose expression /eve/ correlates with both mRNA and gene dose changes 


Protein 


Chromosomal location 


Tumor TCC 


CGH alteration 


Transcript alteration" 


Protein alteration 


Annexfn II 


1q21 


733 


Gain 


Abs to Pres* 


Increase 


Annexin IV 


2p13 


733 


Gain 


3.9-FokJ up 


Increase 


Cytokeratin 17 


17q12-q21 


827 


Gain 


3.8-Fold up 


Increase 


Cytokeratin 20 


17q21.1 


827 


Gain 


5.6-Fold up 


Increase 


(PA-JFABP 


8q21.2 


827 


Loss 


10-Fold down 


Decrease 


FBP1 


9q22 


827 


Gain 


2.3-Fold up 


Increase 


Plasma gelsolin 


9q31 


827 


Gain 


Abs to Pres 


Increase 


Heat shock protein 28 


15q12-q13 


827 


LOSS 


2.5-Fold up 


Decrease 


Prohibits 


17q21 


827/733 


Gain 


3.7-/2.5-Fold up* 


Increase 


Prolyl-4-hydroxyl 


17q25 


827/733 


Gain 


5.7-/1 .6-Fold up 


Increase 


hnRNPBI 


7p15 


827 


Loss 


2.5-FokJ down 


Decrease 



* Abs. absent; Pres. present. 

0 In cases where the corresponding alterations were found in both TCCs 827 and 733 these are shown as 827/733. 
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ever, an increase or decrease in DMA copy number was 
associated with cte novo occurrence or complete loss of tran- 
script, respectively. Some of these transcripts could not be 
detected in the non-invasive tumor but were present at rela- 
tively high levels In areas with DNA amplifications in the inva- 
sive tumors (e.g. in TCC 733 transcript from cellular llgand of 
annexin II gene (chromosome 1q21) from absent to 2670 
arbitrary units; In TCC 827 transcript from small prollne-rich 
protein 1 gene (chromosome 1q12-q21.1) from absent to 
1326 arbitrary units). It may be anticipated from these data 
that significant clustering of genes with an increased expres- 
sion to a certain chromosomal area Indicates an increased 
likelihood of gain of chromosomal material in this area. 

Considering the many possible regulatory mechanisms act- 
ing at the level of transcription, it seems striking that the gene 
dose effects were so clearly detectable in gained areas. One 
hypothetical explanation may lie in the loss of controlled 
methylatlon In tumor cells (17-19). Thus, it may be possible 
that In chromosomes with increased DNA copy numbers two 
or more alleles could be demethyiated simultaneously leading 
to a higher transcription level, whereas In chromosomes with 
losses the remaining allele could be partly methylated, turning 
off the process (20, 21). A recent report has documented a 
ploidy regulation of gene expression In yeast, but In this case all 
the genes were present in the same ratio (22), a situation that Is 
not analogous to that of cancer cells, which show marked 
chromosomal aberrations, as well as gene dosage effects. 

Several CGH studies of bladder cancer have shown that 
some chromosomal aberrations are common at certain 
stages of disease progression, often occurring in more than 1 
of 3 tumors. In pTa tumors, these Include 9p-, 9q-. 1q+. Y- 
(2, 6), and in pT1 tumors. 2q-,1 1p-. 11q-, 1q+. 5p+, *q+, 
17q.+, and 20q+ (2-4, 6, 7). The pTa tumors studied here 
showed similar aberrations such as 9p- and 9q22-q33- and 
9q- and Y-, respectively. Likewise, the two minimal invasive 
pT1 tumors showed aberrations that are commonly seen at 
that stage, and TCC 827 had a remarkable resemblance to the 
commonly seen pattern of losses and gains, such as 1q22-24 
amplification (seen in both tumors), 1 1q14-q22 loss, the latter 
often linked to 17 q+ (both tumors), and 1q+ and 9p-, often 
linked to 20q+ and 11 q13+ (both tumors) (7-9). These ob- 
servations indicate that the pairs of tumors used In this study 
exhibit chromosomal changes observed in many tumors, and 
therefore the findings could be of general importance for 
bladder cancer. 

Considering that the mapping resolution of CGH is of about 
20 megabases it Is only possible to get a crude picture of 
chromosomal instability using this technique. Occasionally, 
we observed reduced transcript levels close to or inside re- 
gions with increased copy numbers. Analysis of these regions 
by positioning heterozygous microsatellites as close as pos- 
sible to the locus showing reduced gene expression revealed 
loss of heterozygosity in several cases. It seems likely that 
multiple and different events occur along each chromosomal 



arm and that the use of cDNA mlcroarrays for analysis of DNA 
copy number changes will reach a resolution that can resolve 
these changes, as has recently been proposed (2). The outlier 
data were not more frequent at the boundaries of the CGH 
aberrations. At present we do not know the mechanism be- 
hind chromosomal aneuploidy and cannot predict whether 
chromosomal gains will be transcribed to a larger extent than 
the two native alleles. A mechanism as genetic imprinting has 
an Impact on the expression level In normai cells and Is often ^ 
reduced in tumors. However, the relation between imprinting 
and gain of chromosomal material Is not known. 

We regard It as a strength of this investigation that we were 
able to compare invasive tumors to benign tumors rather than 
to normal urothelium, as the tumors studied were biologically 
very close and probably may represent successive steps in 
the progression of bladder cancer. Despite the limited amount 
of fresh tissue available it was possible to apply three different 
state of the art methods. The observed correlation between 
DNA copy number and mRNA expression is remarkable when 
one considers that different pieces of the tumor biopsies were 
used for the efferent sets of experiments. TWs indicate that 
bladder tumors are relatively homogenous, a notion recently 
supported by CGH and LOH data that showed a remarkable 
similarity even between tumors and distant metastasis (10, 23). 

In the few cases analyzed, mRNA and protein levels 
showed a striking correspondence although in some cases 
we found discrepancies that may be attributed to translations! 
regulation, post-translational processing, protein degrada- 
tion, or a combination of these. Some transcripts belong to 
undertranslated mRNA pools, which are associated with few 
translationally inactive ribosomes; these pools, however, 
seem to be rare (24). Protein degradation, for example, may 
be very Important in the case of polypeptides .with a short 
half-life (e.g. signaling proteins). A poor correlation between 
mRNA and protein levels was found in liver cells as deter- 
mined by arrays and 2D-PAGE (25), and a moderate correla- 
tion was recently reported by Ideker et al. (26) In yeast 
Qnterestingly, our study revealed a much better correlation 
between gained chromosomal areas and increased mRNA 
levels than between loss of chromosomal areas and reduced 
mRNA levels. In general, the level of CGH change determined 
the ability to detect a change in transcript; One possible 
explanation could be that by losing one allele the change In 
mRNA level is not so dramatic as compared with gain of 
material, which can be rather unlimited and may lead to a 
severalfold increase in gene copy number resulting in a much 
higher Impact on transcript level- The latter would be much 
easier to detect on the expression arrays as the cut-off point 
was placed at a 2-fold level so as not to be biased by noise on 
the array. Construction of arrays with a better signal to noise 
ratio may In the future allow detection of lesser than 2-fold 
alterations in transcript levels, a feature that may facilitate the 
analysis of the effect of loss of chromosomal areas on tran- 
script levels. 
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In eleven cases we found a significant correlation between 
DNA copy number, mRMA expression, and protein level. Four 
of these proteins were encoded by genes located at a fre- 
quently amplified area in chromosome 17q. Whether DNA 
copy number is one of the mechanisms behind alteration of 
these eleven proteins is at present unknown and will have to 
be proved by other methods using a larger number of sam- 
ples. One factor making such studies complicated is the large 
extent of protein modification that occurs after translation, 
requiring immunoidentfflcation and/or mass spectrometry to 
correctly identify the proteins In the gels. 

In conclusion, the results presented in this study exemplify 
the large body of knowledge that may be possible to gather in 
the future by combining state of the art techniques that follow 
the pathway from DNA to protein (26). Here, we used a tradi- 
tional chromosomal CGH method, but in the future high reso- 
lution CGH based on microarrays with many thousand radiation 
hybrid-mapped genes will increase the resolution and informa- 
tion derived from these types of experiments (2). Combined with 
expression arrays analyzing transcripts derived from genes with 
known locations, and 2D gel analysts to obtain information at 
the post-translational level, a clearer and more developed un- 
derstanding of the tumor genome will be forthcoming. 
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ABSTRACT 

Genetic changes underlie tumor progression and may lead to cancer- 
specific expression of critical genes. Over 1100 publications have de- 
scribed tbe use of comparative genomic hybridization (CGI!) to analyze 
the pattern of copy number alterations In cancer, but very few of the genes 
affected are known. Here, we performed high-resolution CGH analysis on 
cDNA microarrays in breast cancer and directly compared copy number 
and mRNA expression levels of 13,824 genes to quantitate the impact of 
genomic changes on gene expression* We identified and mapped the 
boundaries of 24 independent amplicons, ranging in size from 0.2 to 12 
Mb. Throughout the genome, both high- and low-level copy number 
changes bad a substantial impact on gene expression, with 44% of the 
highly amplified genes showing overexpression and 10 JS% of tbe highly 
overexpressed genes being amplified. Statistical analysis with random 
permutation tests identified 270 genes whose expression levels across 14 
samples were systematically attributable to gene amplification. These 
included most previously described amplified genes in breast cancer and 
many novel targets for genomic alterations, including tbe HOXB7 gene, 
tbe presence of which in a novel amplicon at 17q21 J was validated in 
10.2% of primary breast cancers and associated with poor patient prog- 
nosis. In conclusion, CGH on cDNA microarrays revealed hundreds of 
novel genes whose overexpression is attributable to gene amplification. 
These genes may provide insights to the clonal evolution and progression 
of breast cancer and highlight promising therapeutic targets. 

INTRODUCTION 

Gene expression patterns revealed by cDNA microarrays have 
facilitated classification of cancers into biologically distinct catego- 
ries, some of which may explain the clinical behavior of the tumors 
(1-6). Despite this progress in diagnostic classification, the molecular 
mechanisms underlying gene expression patterns in cancer have re- 
mained elusive, and the utility of gene expression profiling in the 
identification of specific therapeutic targets remains limited 

Accumulation of genetic defects is thought to underlie the clonal 
evolution of cancer. Identification of the genes that mediate the effects 
of genetic changes may be important by highlighting transcripts that 
are actively involved in tumor progression. Such transcripts and their 
encoded proteins would be ideal targets for anticancer therapies, as 
demonstrated by the clinical success of new therapies against ampli- 
fied oncogenes, such as ERBB2 and EGFR (7, 8), in breast cancer and 
other solid tumors. Besides amplifications of known oncogenes, over 
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Fig. 1 . Impact of gene copy number on global gene expression levels. A. percentage of 
over- and undercxprcsscd genes (Y axis) according to copy number ratios (X axis). 
Threshold values used for over- and undcrexpression were >2. ] 84 (global upper 1% of 
the cDNA ratios) and <0.4826 (global lower 1% of the expression ratios). B, percentage 
of amplified and deleted genes according to expression ratios. Threshold values for 
amplification and deletion were >1.5 and <0.7. 



20 recurrent regions of DNA amplification have been mapped in 
breast cancer by CGH 3 (9, 10). However, these amplicons are often 
large and poorly defined, and their impact on gene expression remains 
unknown. 

We hypothesized that genome-wide identification of those gene 
expression changes that are attributable to underlying gene copy 
number alterations would highlight transcripts that are actively in- 
volved in the causation or maintenance of the malignant phenotype. 
To identify such transcripts, we applied a combination of cDNA and 
CGH microarrays to: (a) determine the global impact that gene copy 
number variation plays in breast cancer development and progression; 
and (b) identify and characterize those genes whose mRNA expres- 



5 The abbreviations used are: CGH, comparative genomic hybridization; FISH, fluo- 
rescence in situ hybridization; RT-PCR, reverse transcription-PCR. 
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Fib. 2 Genome-wide copy number and expression analysis in the MCF-7 breast cancer cell line. A. chromosomal CGH analysis of MCF-7. The copy number ratio profile (blue 
line) across the entire genome from 1 p telomere to Xq telomere is shown along with ± 1 SD (orange lines). The black horizontal line indicates a ratio of 1 .0; red line, a ratio of 0.8; 
and Veen tine, a ratio of 1 .2. B-C genome-wtde copy number analysis in MCF-7 by CGH on cDNA microarray. The copy number ratios were plotted as a function of me rxwition 
of the cDNA clones along the human genome. In £, individual data points arc connected with a line, and a moving median of 10 adjacent clones is shown. Red horizontal line, the 
cody number ratio of 1 0 In C. individual data points are labeled by color coding according to cDN A expression ratios. The bright red dots indicate the upper 2%, and dark red dots, 
the next 5% of the expression ratios in MCF-7 cells (overexpressed genes); bright green dots indicate the lowest 2%, and dark green dots, the next 5% of the expression ratios 
(undercxpiessed genes); the rest of the observations are shown with black crosses. The chromosome numbers are | shown at the bottom of the figure, and chromosome boundaries axe 
indicated with a dashed line. 



sion is most significantly associated with amplification of the corre- 
sponding genomic template. 

MATERIALS AND I^ETHODS 

Breast Cancer Cell Lines, Fourteen breast cancer cell lines (BT-20, BT- 
474, HCC1428, Ha578t, MCF7, MDA-361, MDA-436, MDA-453, MDA-468, 
SKBR-3, T-47D, UACC812, ZR-75-1, and ZR-75-30) were obtained from the 
American Type Culture Collection (Manassas, VA). Cells were grown under 
recommended culture conditions. Genomic DNA and mRNA were isolated 
using standard protocols. 

Copy Number and Expression Analyses by cDNA Mkro arrays. The 
preparation and printing of the 13,824 cDNA clones on glass slides were 
performed as described (11-13). Of these clones, 244 represented uncharac- 
terized expressed sequence tags, and the remainder corresponded to known 
genes. CGH experiments on cDNA microarrays were done as described (14, 
15). Briefly, 20 /tg of genomic DNA from breast cancer cell lines and normal 
human WBCs were digested for 14-18 h with Alul and /teal (Lire Technol- 
ogies, Inc., Rockville, MD) and purified by phenol/chloroform extraction. Six 
of digested cell line DNAs were labeled with Cy3-dUTP (Amersham 
Pharmacia) and normal DNA with Cy5-dUTP (Amersham Pharmacia) using 
the Bioprime Labeling kit (Life Technologies, Inc.). Hybridization ( 1 4, 1 5) and 
posthybridization washes (13) were done as described. For the expression 
analyses, a standard reference (Universal Human Reference RNA; Stratagene, 
La Jolla, CA) was used in all experiments. Forty /tg of reference RNA were 
labeled with Cy3-dUTP and 3.5 *ig of lest mRNA with Cy5-dUTP, and the 
labeled cDNAs were hybridized on microarrays as described (13, 15). For both 
microarray analyses, a laser confocal scanner (Agilent Technologies, Palo 
Alto, CA) was used to measure the fluorescence intensities at the target 
locations using the DE ARRAY software (16). After background subtraction, 
average intensities at each clone in the test hybridization were divided by the 
average intensity of the corresponding clone in the control hybridization. For 
the copy number analysis, the ratios were normalized on the basis of the 
distribution of ratios of all targets on the array and for the expression analysis 
on the basis of 88 housekeeping genes, which were spotted four times onto the 
array. Low quality measurements (I.e., copy number data with mean reference 
intensity <100 fluorescent units, and expression data with both test and 
reference intensity <100 fluorescent units and/or with spot size <50 units) 



were excluded from the analysis and were treated as missing values. The 
distributions of fluorescence ratios were used to define cutpoints for increased/ 
decreased copy number. Genes with CGH ratio > 1.43 (representing the upper 
5% of the CGH ratios across all experiments) were considered to be amplified, 
and genes with ratio <0.73 (representing the lower 5%) were considered to be 
deleted. 

Statistical Analysis of CGH and cDNA Microarray Data. To evaluate 
the influence of copy number alterations on gene expression, we applied the 
following statistical approach. CGH and cDNA calibrated intensity ratios were 
log-transformed and normalized using median centering of the values in each 
cell line. Furthermore, cDNA ratios for each gene across all 14 cell lines were 
median centered. For each gene, the CGH data were represented by a vector 
that was labeled 1 for amplification (ratio, > 1,43) and 0 for no amplification. 
Amplification was correlated with gene expression using the signal-to-noise 
statistics (I). We calculated a weight, w r for each gene as follows: 

m,! - m^ 



where m gl , <r gX and tr^ denote the means and SDs for the expression 
levels for amplified and nonamplified cell lines, respectively. To assess the 
statistical significance of each weight, we performed 10,000 random permu- 
tations of the label vector. The probability that a gene had a larger or equal 
weight by random permutation than the original weight was denoted by a. A 
low a (<0.05) indicates a strong association between gene expression and 
amplification. 

Genomic Localization of cDNA Clones and Amplicon Mapping. Each 
cDNA clone on the microarray was assigned to a Unigene cluster using the 
Unigene Build 1 41. 6 A database of genomic sequence alignment information 
for mRNA sequences was created from the August 2001 freeze of the Uni- 
versity of California Santa Cruz's GoldenPath database. 7 The chromosome and 
bp positions for each cDNA clone were then retrieved by relating these data 
sets. Amplicons were defined as a CGH copy number ratio >2.0 in at least two 
adjacent clones in two or more cell lines or a CGH ratio >2.0 in at least three 
adjacent clones in a single cell line. The amplicon start and end positions were 



* Internet address: httpy/rcscarch.nhgri.nih.gov/microanay/downloadable_cdna.h!rnl. 
7 Internet address: www.gcnomc.ucsc.edu. 
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Table i Summery of independent amplicons in 14 breast cancer ceil lines by 
CGH microarray 



Location 

lp!3 
lq21 
tq22 
3pl4 

7pt2.l-7pU.2 

7q31 

7q32 

8q2f.U-8q21.13 
8q2l.3 

8q23.3-*q24.14 

8q24.22 

9pl3 

l3422-q3l 

16q22 

17qll 

I7ql2-q2l.2 

I7q2l.32-q2l.33 

I7q22-q23.3 

I7q23.3-q24j 

I9ql3 

20q 11.22 

20ql3J2 

20ql3.l2-ql3.l3 

20ql3.2-ql3.32 



extended to include neighboring nonamplified clones (ratio, <1.5). The am- 
plicon size determination was partially dependent on local clone density. 

FISH. Dual-color interphase FISH to breast cancer cell lines was done as 
described (17). Bacterial artificial chromosome clone RP11-361K8 was la- 
beled with SpectrumOrange (Vysis, Downers Grove, IL), and Spectrum* 
Orange-labeled probe for EGFR was obtained from Vysis. SpectrumGrecn- 
labeled chromosome 7 and 17 centromere probes (Vysis) were used as a 
reference. A tissue microarray containing 612 formalm-fixed, paraffin-embed- 
ded primary breast cancers (17) was applied in FISH analyses as described 
(1 8). The use of these specimens was approved by the Ethics Committee of the 
University of Basel and by the NIH. Specimens containing a 2-fold or higher 
increase in the number of test probe signals, as compared with corresponding 
centromere signals, in at least 10% of the tumor cells were considered to be 
amplified. Survival analysis was performed using the Kaplan-Meier method 
and the log-rank test 

RT-PCR. The HOXB7 expression level was determined relative to 
GAPDH. Reverse transcription and PCR amplification were performed using ■ 
Access RT-PCR System (Promega Corp., Madison, Wl) with 10 ng of mRNA 
as a template. HOXB7 primers were 5 '-G AGC AG AGGG ACTCGG ACTT-3 ' 
and 5 ' -GCGTCAGGTAGCG ATTGT AG-3' . 



Start (Mb) 


End (Mb) 


aizc llnDj 


132.79 


132.94 


0.2 


173.92 


177.25 


3.3 


179.28 


179.57 


0.3 


71.94 


74.66 


2.7 


55.62 


60.95 


5.3 


125.73 


130.96 


5.2 


140.01 


140.68 


0.7 


86.45 


92.46 


6.0 


98.45 


103.05 


4.6 


129.88 


142.15 


12.3 


151.21 


152.16 


1.0 


38.65 


39.25 


0.6 


77.15 


81.38 


4.2 


86.70 


87.62 


0.9 


29.30 


30.85 


1.6 


39.79 


42.80 


3.0 


52.47 


55.80 


3.3 


63.81 


69.70 


5.9 


69.93 


74.99 


5.1 


40.63 


41.40 


0.8 


34.59 


35.85 


1.3 


44.00 


45.62 


1.6 


46.45 


49.43 


3.0 


51.32 


59.12 


7.8 



CGH were validated, with lq21, 17qlZ— q21.2, 17q22-q23, 20ql3.1, 
and 20ql3.2 regions being most conuxnonly amplified. Furthermore, 
the boundaries of these amplicons were precisely delineated. In ad- 
dition, novel amplicons were identified at 9pl3 (38.65-39.25 Mb), 
and 17q21.3 (52.47-55.80 Mb). 

Direct Identification of Putative Amplification Target Genes. 
The cDNA/CGH microarray technique enables the direct correla- 
tion of copy number and expression ciata on a gene-by-gene basis 
throughout the genome. We directly annotated high-resolution 
CGH plots with gene expression data using color coding. Fig. 2C 
shows that most of the amplified genes in the MCF-7 breast cancer 
cell line at lpl3, 17q22-q23, and 20ql3 were highly overex- 
pressed. A view of chromosome 7 in the MDA-468 cell line 
implicates EGFR as the most highly overexpressed and amplified 
gene at 7pl l-p!2 (Fig. 3A). In BT-474, the two known amplicons 
at 17ql2 and 17q22-q23 contained numerous highly overex- 
pressed genes (Fig. 35). In addition* several genes, including the 
homeobox genes HOXB2 and HOXB 7, were highly amplified in a 
previously undescribed independent amplicon at 17q21.3. HOXB7 
was systematically amplified (as validated by FISH, Fig. 35, inset) 
as well as overexpressed (as verified by RT-PCR, date not shown) 
in BT-474, UACC812, and ZR-75-30 cells. Furthermore, this novel 




RESULTS 

Global Effect of Copy Number on Gene Expression. 13,824 
arrayed cDNA clones were applied for analysis of gene expression 
and gene copy number (CGH raicroarrays) in 14 breast cancer cell 
lines. The results illustrate a considerable influence of copy number 
on gene expression patterns. Up to 44% of the highly amplified 
transcripts (CGH ratio, >2.5) were overexpressed (i.e. belonged to 
the global upper 7% of expression ratios), compared with only 6% for 
genes with normal copy number levels (Fig. 1A). Conversely, 10.5% 
of the transcripts with high-level expression (cDNA ratio, >10) 
showed increased copy number (Fig. \B). Low-level copy number 
increases and decreases were also associated with similar, although 
less dramatic, outcomes on gene expression (Fig. 1). 

Identification of Distinct Breast Cancer Amplicons. Base-pair 
locations obtained for 1 1 ,994 cDNAs (86.8%) were used to plot copy 
number changes as a function of genomic position (Fig. 2, Supple- 
ment Fig. A). The average spacing of clones throughout the genome 
was 267 kb. This high-resolution mapping identified 24 independent 
breast cancer amplicons, spanning from 0.2 to 12 Mb of DNA (Table 
1). Several amplification sites detected previously by chromosomal 
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Chf.17 



• HOXB7 



*PPMlO 

• cue 




0183436789 

Fig. 3. Annotation of gene expression data on CGH microarray profiles. A, genes in the 
7pU -pi 2 amplicon in the MDA-468 cell line are highly expressed (red dots) and include 
the EGFR oncogene. B. several genes in the J7ql2, 17q2l.3. and I7q23 amplicons to the 
BT-474 breast cancer cell line are highly overexpressed (red) and include the HOXB7 
gene. The data labels and color coding art as indicated Tor Fig. 2C. Insets show 
chromosomal CGH profiles for the corresponding chromosomes and validation of the 
increased copy number by interphase FISH using ECFR (red) and chromosome 7 
centromere probe (green) to MDA-468 (A) and HOXB7-%ptcif\c probe (red) and chro- 
mosome 17 centromere (green) to BT-474 cells (B). 
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Fig. 4. List of 50 genes with a rtattsocally 
significant correlation (or value <0.O5) between 
gene copy number and gene expression. Nome, 
chromosomal location, and the a value for each 
gene are indicated. The genes have beenordered 
according to their position in the genome. The color 
maps on the right illustrate the copy number and 
expression ratio patterns in the 14 cell lines. The 
key to the color code is shown at the bottom of the 
graph. Gray squares, missing values. The complete 
list of 270 genes is shown in supplemental Fig. a. 
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amplification was validated to be present in 10.2% of 363 primary 
breast cancers by FISH to a tissue microarray and was associated 
with poor prognosis of the patients (P = 0.001). 

Statistical Identification and Characterization of 270 Highly 
Expressed Genes in Amplicons. Statistical comparison of expres- 
sion levels of all genes as a function of gene amplification identified 
270 genes whose expression was significantly influenced by copy 
number across all 14 cell lines (Fig. 4, Supplemental Fig. B). Accord- 
ing to the gene ontology data, 8 91 of the 270 genes represented 
hypothetical proteins or genes with no functional annotation, whereas 
179 had associated functional information available. Of these, 151 
(84%) are implicated in apoptosis, cell proliferation, signal transduc- 
tion, and transcription, whereas 28 (16%) had functional annotations 
that could not be directly linked with cancer. 



DISCUSSION 

The importance of recurrent gene and chromosome copy number 
changes in the development and progression of solid tumors has been 
characterized in >1000 publications applying CGH 9 (9, 10), as well 
as in a large number of other molecular cytogenetic, cytogenetic, and 
molecular genetic studies. The effects of these somatic genetic 
changes on gene expression levels have remained largely unknown, 
although a few studies have explored gene expression changes occur- 
ring in specific amplicons (15, 19-21). Here, we applied genome- 
wide cDNA microarrays to identify transcripts whose expression 
changes were attributable to underlying gene copy number alterations 
in breast cancer. 

The overall impact of copy number on gene expression patterns was 
substantial with the most dramatic effects seen in the case of high* 



* Internet address; hupy;www.gcneonlology.org/. 



4 Internet address: htrp^www.ncbi. nlm.nih.gov/entrex. 
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level copy number increase. Low-level copy number gains and losses 
also had a significant influence on expression levels of genes in the 
regions affected, but these effects were more subtle on a gene-by-gene 
basis than those of high-level amplifications. However, the impact of 
low-level gains on the dysregulation of gene expression patterns in 
cancer may be equally important if not more important than that of 
high-level amplifications. Aneuploidy and low-level gains and losses 
of chromosomal arms represent the most common types of genetic 
alterations in breast and other cancers and, therefore, have an influ- 
ence on many genes. Our results in breast cancer extend the recent 
studies on the impact of aneuploidy on global gene expression pat- 
terns in yeast cells, acute myeloid leukemia, and a prostate cancer 

model system (22-24). * « A * A A ♦ u « 

The CGH raicroarray analysis identified 24 independent breast 
cancer amplicons. We defined the precise boundaries for many am- 
plicons detected previously by chromosomal CGH (9, 10, 25, 26) and 
also discovered novel amplicons that had not been detected previ- 
ously presumably because of their small size (only 1-2 Mb) or close 
proximity to other larger amplicons. One of these novel amplicons 
involved the bomeobox gene region at I7q21 J and led to the over- 
expression of the HOXB7 and HOXB2 genes. The homeodomain 
transcription factors are known to be key regulators of embryonic 
development and have been occasionally reported to undergo aberrant 
expression in cancer (27, 28). HOXB7 transfection induced cell pro- 
liferation in melanoma, breast, and ovarian cancer cells and increased 
tumorigenicity and angiogenesis in breast cancer (29-32). The pres- 
ent results imply that gene amplification may be a prominent mech- 
anism for overexposing HOXB7 in breast cancer and suggest that 
HOXB7 contributes to tumor progression and confers an aggressive 
disease phenotype in breast cancer. This view is supported by our 
finding of amplification of HOXB7 in 10% of 363 primary breast 
cancers, as well as an association of amplification with poor prognosis 
of the patients. 

We carried out a systematic search to identify genes whose 
expression levels across all 14 cell lines were attributable to 
amplification status. Statistical analysis revealed 270 such genes 
(representing -2% of all genes on the array), including not only 
previously described amplified genes, such as HER-2, MYC, 
£GFR f ribosomal protein s6 kinase, and AIB3, but also numerous 
novel genes such as NRAS-related gene (lpl3), syndecan-2 (8q22), 
and bone morphogenic protein (20ql3.1), whose activation by 
amplification may similarly promote breast cancer progression. 
Most of the 270 genes have not been implicated previously in 
breast cancer development and suggest novel pathogenetic mech- 
anisms. Although we would not expect all of them to be causally 
involved, it is intriguing that 84% of the genes with associated 
functional information were implicated in apoptosis, cell prolifer- 
ation, signal transduction, transcription, or other cellular processes 
that could directly imply a possible role in cancer progression. 
Therefore, a detailed characterization of these genes may provide 
biological* insights to breast cancer progression and might lead to 
the development of novel therapeutic strategies. 

In summary, we demonstrate application of cDNA microarrays 
to the analysis of both copy number and expression levels of over 
1 2,000 transcripts throughout the breast cancer genome, roughly 
once every 267 kb. This analysis provided: (a) evidence of a 
prominent global influence of copy number changes on gene 
expression levels; (b) a high-resolution map of 24 independent 
amplicons in breast cancer; and (c) identification of a set of 270 
«enes, the overexpression of which was statistically attributable to 
gene 'amplification. Characterization of a novel amplicon at 
17q21.3 implicated amplification and overexpression of the 
HOXB7 gene in breast cancer, including a clinical association 



between HOXB7 amplification and poor patient prognosis. Overall, 
our results illustrate how the identification of genes activated by 
gene amplification provides a powerful approach to highlight 
genes with an important role in cancer as well as to prioritize and 
validate putative targets for therapy development. 
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Genomic DNA copy number alterations are key genetic events in 
the development and progression of human cancers. Here we 
report a genome-wide microarray comparative genomic hybrid- 
ization (array CGH) analysis of DNA copy number variation in 
a series of primary human breast tumors. We have profiled DNA 
copy number alteration across 6,691 mapped human genes, in 44 
predominandy advanced, primary breast tumors and 10 breast 
cancer cell lines. While the overall patterns of DNA amplification 
and deletion corroborate previous cytogenetic studies, the high- 
resolution (gene-by-gene) mapping of amplicon boundaries and 
the quantitative analysis of amplicon shape provide significant 
improvement in the localization of candidate oncogenes. Parallel 
microarray measurements of mRNA levels reveal the remarkable 
degree to which variation In gene copy number contributes to 
variation in gene expression in tumor cells. Specifically, we find 
that 62% of highly amplified genes show moderately or highly 
elevated expression, that DNA copy number influences gene ex- 
pression across a wide range of DNA copy number alterations 
(deletion, low-, mid- and high-level amplification), that on average, 
a 2-fold change in DNA copy number is associated with a corre- 
sponding 1.5-fold change in mRNA levels, and that overall, at least 
12% of all the variation in gene expression among the breast 
tumors is directly attributable to underlying variation in gene copy 
number. These findings provide evidence that widespread DNA 
copy number alteration can lead directly to global deregulation of 
gene expression, which may contribute to the development or 
progression of cancer. 

Conventional cytogenetic techniques, including comparative 
genomic hybridization (CGH) (1), have led to the identifi- 
cation of a number of recurrent regions of DNA copy number 
alteration in breast cancer cell lines and tumors (2-4). While 
some of these regions contain known or candidate oncogenes 
[e.g., FGFR1 (8pll), MYC (8q24) f CCND1 (llq!3), ERBB2 
(17ql2), and ZNF217 (20ql3)] and tumor suppressor genes 
[RBI (13ql4) and TP53 (17pl3)], the relevant gene(s) within 
other regions (e.g., gain of lq, 8q22, and 17q22-24, and loss of 
8p) remain to be identified. A high-resolution genome-wide 
map, delineating the boundaries of DNA copy number alter- 
ations in tumors, should facilitate the localization and identifi- 
cation of oncogenes and tumor suppressor genes in breast 
cancer. In this study, we have created such a map, using 
array-based CGH (5-7) to profile DNA copy number alteration 
in a series of breast cancer cell lines and primary tumors. 

An unresolved question is the extent to which the widespread 
DNA copy number changes that we and others have identified 
in breast tumors alter expression of genes within involved 
regions. Because we had measured mRNA levels in parallel in 
the same samples (8), using the same DNA microarrays, we had 
an opportunity to explore on a genomic scale the relationship 
between DNA copy number changes and gene expression. From 



this analysis, we have identified a significant impact of wide- 
spread DNA copy number alteration on the transcriptional 
programs of breast tumors. 

Materials and Methods 

Tumors and Cell Lines. Primary breast tumors were predominantly 
large (>3 cm), intermediate-grade, infiltrating ductal carcino- 
mas, with more than 50% being lymph node positive. The 
fraction of tumor cells within specimens averaged at least 50%. 
Details of individual tumors have been published (8, 9), and 
are summarized in Table 1, which is published as supporting 
information on the PNAS web site, www.pnas.org. Breast cancer 
cell lines were obtained from the American Type Culture 
Collection. Genomic DNA was isolated either using Qiagen 
genomic DNA columns, or by phenol/chloroform extraction 
followed by ethanol precipitation. 

DNA Labeling and Microarray Hybridizations. Genomic DNA label- 
ing and hybridizations were performed essentially as described 
in Pollack et al (7), with slight modifications. Two micrograms 
of DNA was labeled in a total volume of 50 microliters and the 
volumes of all reagents were adjusted accordingly. "Test" DNA 
(from tumors and cell lines) was f hiprescentry labeled (Cy5) and 
hybridized to a human cDNA microarray containing 6,691 
different mapped human genes (i.e., UniGene clusters). The 
"reference" (labeled with Cy3) for each hybridization was nor- 
mal female leukocyte DNA from a single donor. The fabrication 
of cDNA microarrays and the labeling and hybridization of 
mRNA samples have been described (8). 

Data Analysis and Map Positions. Hybridized arrays were scanned 
on a GenePix scanner (Axon Instruments, Foster City, CA), and 
fluorescence ratios (test/reference) calculated using scanalyze 
software (available at http://rana.lbl.gov). Fluorescence ratios 
were normalized for each array by setting the average log 
fluorescence ratio for all array elements equal to 0. Measure- 
ments with fluorescence intensities more than 20% above back- 
ground were considered reliable. DNA copy number profiles 
that deviated significantly from background ratios measured in 
normal genomic DNA control hybridizations were interpreted as 
evidence of real DNA copy number alteration (see Estimating 
Significance of Altered Fluorescence Ratios in the supporting 
information). When indicated, DNA copy number profiles arc 
displayed as a moving average (symmetric 5 -nearest neighbors). 
Map positions for arrayed human cDNAs were assigned by 
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Ro 1 Genome-wide measurement of DMA copy number alteration by array CGH. (a) DNA copy number profiles are Illustrated for cell lines containing different 
numbers of X chromosomes, for breast cancer cell lines, and for breast tumors. Each row represents a different cell line or tumor, and each column represents 
oneof 6 691 different mapped human genes present on the mfcroarray. ordered by genome map position from Ipter through Xqter. Moving average (symmetric 
5-nearest neighbors) fluorescence ratios (test/reference) are depicted using a log r based pseudocolor scale (indicated), such that red luminescence reflects 
fold" inpiification, green luminescence reflects foW-deletion. and black indicates no change (gray indicates poorly measured data), (b) Enlarged view of DNA 
copy number profiles across the X chromosome, shown for cell lines containing different numbers of X chromosomes. 



identifying the starting position of the best and longest match of 
any DNA sequence represented in the corresponding UniGene 
cluster (10) against the "Golden Path" genome assembly 
(http://genome.ucsc.edu/; Oct 7, 2000 Freeze). For UniGene 
clusters represented by multiple arrayed elements, mean fluo- 
rescence ratios (for all elements representing the same UniGene 
cluster) are reported. For mRNA measurements, fluorescence 
ratios are "mean-centered" (i.e., reported relative to the mean 
ratio across the 44 tumor samples). The data set described here 
can be accessed in its entirety in the supporting information. 

Results 

We performed CGH on 44 predominantly locally advanced, 
primary breast tumors and 10 breast cancer cell lines, using 
cDNA microarrays containing 6,691 different mapped human 
genes (Fig. \a\ also see Materials and Methods for details of 
microarray hybridizations). To take full advantage of the im- 
proved spatial resolution of array CGH, we ordered (fluores- 
cence ratios for) the 6,691 cDNAs according to the "Golden 
Path" (http://genome.ucsc.edu/) genome assembly of the draft 
human genome sequences (11). In so doing, arrayed cDNAs not 
only themselves represent genes of potential interest (e.g., 
candidate oncogenes within amplicons), but also provide precise 
genetic landmarks for chromosomal regions of amplification and 



deletion. Parallel analysis of DNA from cell lines containing 
different numbers of X chromosomes (Fig. lb), as we did before 
(7), demonstrated the sensitivity of our method to detect single- 
copy loss (45, XO), and 1.5- (47,XXX), 2- (4S\XXXX), or 
23-fold (49,XXXXX) gains (also see Fig. 5, which is published 
as supporting information on the PNAS web site). Fluorescence 
ratios were linearly proportional to copy number ratios, which 
were slightly underestimated, in agreement with previous ob- 
servations (7). Numerous DNA copy number alterations were 
evident in both the breast cancer ceU lines and primary tumors 
(Fig. la), detected in the tumors despite the presence of euploid 
non-tumor cell types; the magnitudes of the observed changes 
were generally lower in the tumor samples. DNA copy-number 
alterations were found in every cancer cell line and tumor, and 
on every human chromosome in at least one sample. Recurrent 
regions of DNA copy number gain and loss were readily iden- 
tifiable. For example, gains within lq, 8q, 17q, and 20q were 
observed in a high proportion of breast cancer cell lines/tumors 
(90%/69%, 100%/47%, 10W/60%, and 90%/44%, respective- 
ly), as were losses within Ip, 3p, &p, and 13q (80%/24%, 
80%/22% t 80%/22%, and 7096/18%, respectively), consistent 
with published cytogenetic studies (refc. 2-4; a complete listing 
of gains/losses is provided in Tables 2 and 3, which are published 
as supporting information on the PNAS web site). The total 
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Fig. 2. DNA copy number alteration across chromosome 8 by array C6H. (a) DNA copy number profiles are illustrated for cell lines containing different numbers 
of X chromosomes, for breast cancer cell lines, and for breast tumors. Breast cancer cell lines and tumors are separately ordered by hierarchical clustering to 
highlight recurrent copy number changes. The 241 genes present on the mtcroarrays and mapping to chromosome 8 are ordered by position along the 
chromosome. Fluorescence ratios (test/reference) are depicted by a log; pseudocolor scale (indicated). Selected genes are indicated with color-coded text (red, 
increased; green, decreased; blade, no change; gray, not well measured) to reflect correspondingly altered mRNA levels (observed in the majority of the subset 
of samples displaying the DNA copy number change). The map positions for genes of interest that are not represented on the microarray are indicated In the 
row above those genes represented on the 9rr9f. (b) Graphical display of ONA copy number profile for breast cancer cell line SKBR3. Fluorescence ratios 
(tumor/normal) are plotted on a log? scale for chromosome 8 genes, ordered along the chromosome. 



number of genomic alterations (gains and losses) was found to 
be significantly higher in breast tumors that were high grade (P « 
0.008), consistent with published CGH data (3), estrogen recep- 
tor negative (P = 0.04), and harboring TP53 mutations (P - 
0.0006) (see Table 4, which is published as supporting informa- 
tion on the PNAS web site). 

The improved spatial resolution of our array CGH analysis is 
illustrated for chromosome 8, which displayed extensive DNA 
copy number alteration in our series. A detailed view of the 
variation in the copy number of 241 genes mapping to chromo- 
some 8 revealed multiple regions of recurrent amplification; 
each of these potentially harbors a different known or previously 
uncharacterized oncogene (Fig. 2a). The complexity of amplicon 
structure is most easily appreciated in the breast cancer cell line 
SKBR3. Although a conventional CGH analysis of 8q in SKBR3 
identified only two distinct regions of amplification (12), we 
observed three distinct regions of high-level amplification (la- 
beled 1-3 in Fig. 2b). For each of these regions we can define the 



boundaries of the interval recurrently amplified in the tumors we 
examined; in each case, known or plausible candidate oncogenes 
can be identified (a description of these regions, as well as the 
recurrently amplified regions on chromosomes 17 and 20, can be 
found in Figs. 6 and 7, which are published as supporting 
information on the PNAS web site). 

For a subset of breast cancer cell lines and tumors (4 and 37, 
respectively), and a subset of arrayed genes (6,095), mRNA 
levels were quantitatively measured in parallel by using cDNA 
microarrays (8). The parallel assessment of mRNA levels is 
useful in the interpretation of DNA copy number changes. For 
example, the highly amplified genes that are also highly ex- 
pressed are the strongest candidate oncogenes within an ampli- 
con. Perhaps more significantly, our parallel analysis of DNA 
copy number changes and mRNA levels provides us the oppor- 
tunity to assess the global impact of widespread DNA copy 
number alteration on gene expression in tumor cells. 

A strong influence of DNA copy number on gene expression 
is evident in an examination of the pseudocolor representations 
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Fig. 3. Concordance between DMA copy number and gene expression across chromosome 1 7. DNA copy number alteration (Upper) and mRNA levels (tower) 
are illustrated for breast cancer cell lines and tumors. Breast cancer cell fines and tumors are separately ordered by hierarchical clustering {Upper)* arid the 
identical sample order is maintained (tower). The 354 genes present on the microarrays and mapping to chromosome 1 7, and for which both DNA copy number 
and mRNA levels were determined, are ordered by position along the chromosome; selected genes are indicated in color-coded text (see Fig. 2 legend). 
Fluorescence ratios (test/reference) are depicted by separate iog 2 pseudocolor scales (indicated). 



of DNA copy number and mRNA levels for genes on chromo- 
some 17 (Fig. 3). The overall patterns of gene amplification and 
elevated gene expression are quite concordant; i.e., a significant 
fraction of highly amplified genes appear to be correspondingly 
highly expressed. The concordance between high-level amplifi- 
cation and increased gene expression is not restricted to chro- 
mosome 17. Genome-wide, of 117 high-level DNA amplifica- 
tions (fluorescence ratios >4, and representing 91 different 
genes), 62% (representing 54 different genes; see Table 5, which 
is published as supporting information on the PNAS web site) 
are found associated with at least moderately elevated mRNA 
levels (mean-centered fluorescence ratios >2), and 42% (rep- 
resenting 36 different genes) are found associated with compa- 
rably highly elevated mRNA levels (mean-centered fluorescence 
ratios >4). 

To determine the extent to which DNA deletion and lower- 
level amplification (in addition to high-level amplification) are 
also associated with corresponding alterations in mRNA levels, 
we performed three separate analyses on the complete data set 
(4 cell lines and 37 tumors, across 6,095 genes). First, we 
determined the average mRNA levels for each of five classes 
of genes, representing DNA deletion, no change, and low-, 
medium-, and high-level amplification (Fig. 4a). For both the 



breast cancer cell lines and tumors, average mRNA levels 
tracked with DNA copy number across all five classes, in a 
statistically significant fashion (P values for pair-wise Student's 
t tests comparing adjacent classes: cell lines, 4 x 10~ 49 ,1 x 10~ 49 , 
5 X 10~* 1 X 10- 2 ; tumors, 1 X lO" 43 , 1 x 10" 214 , 5 X IfT 41 , 
1 X 10 -4 ). A linear regression of the average log(DNA copy 
number), for each class, against average log(mRNA level) 
demonstrated that on average, a 2-fold change in DNA copy 
number was accompanied by 1 .4- and 1.5-foJd changes in mRNA 
level for the breast cancer cell lines and tumors, respectively (Fig. 
4a, regression line not shown). Second, we characterized the 
distribution of the 6,095 correlations between DNA copy num- 
ber and mRNA level, each across the 37 tumor samples (Fig. 4b). 
The distribution of correlations forms a normal-shaped curve, 
but with the peak markedly shifted in the positive direction from 
zero. This shift is statistically significant, as evidenced in a plot 
of observed vs. expected correlations (Fig. 4c), and reflects a 
pervasive global influence of DNA copy number alterations on 
gene expression. Notabry, the highest correlations between DNA 
copy number and mRNA level (the right tail of the distribution 
in Fig. 4b) comprise both amplified and deleted genes (data not 
shown). Third, we used a linear regression model to estimate the 
fraction of ail variation measured in mRNA levels among the 37 
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tumors that could be attributed to underlying variation in DNA 
copy number. From this analysis, we estimate that, overall about 
7% of all of the observed variation in mRNA levels can be 
explained directly by variation in copy number of the altered 
zenes (Flu We can reduce the effects of experimental 
measurement error on this estimate by using only that racnon 
of the data most reliably measured (fluorescence intensity/ 
backaround >3); using that data, our estimate of the percent 
variation in mRNA levels directly attributed to variation in gene 
copy number increases to 12% (Fig. 4d). This still undoubtedly 
represents a significant underestimate, as the observed variation 
indobal gene expression is affected not only by true variation in 
he expression programs of the tumor cells themselves but also 
by the variable pretence of non-tumor cell types within clinical 

samples. 
Discussion 

This genome-wide, array CGH analysis of DNA copy number 
alteration in a series of human breast tumors demonstrates the 
usefulness of defining amplicon boundaries at high resolution 
( eene-bv-fiene), and quantitatively measuring amphcon shape, to 
alsist in locating and identifying candidate oncogenes. By ana- 
lyzing mRNA levels in parallel, we have also discovered that 
chanees in DNA copy number have a large, pervasive, direct 
effect on global gene expression patterns in both breast cancer 
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cell lines and tumors. Although the DNAmicroarrays used in our 
analysis may display a bias toward characterized and/or. highly 
expressed genes, because we are examining such a large fraction 
of the genome (approximately 20% of all human genes), and 
because, as detailed above, we are likely underestimating the 
contribution of DNA copy number changes to altered gene 
expression, we believe our findings are likely to be generalizable 
(but would nevertheless still be remarkable if only applicable to 
this set of -6,100 genes). , 

In budding yeast, aneuploidy has been shown to result in 
chromosome-wide gene expression biases (13). Two recent 
studies have begun to examine the global relationship between 
DNA copy number and gene expression in cancer cells. In 
agreement with our findings, Phillips et aL (14) have shown that 
with the acquisition of tumorigenicity in an immortalized pros- 
tate epithelial cell line, new chromosomal gains and losses 
resulted in a statistically significant respective increase and 
decrease in the average expression level of involved genes. In 
contrast, Platzer et aL (15) recently reported that in metastatic 
colon tumors only ~4% of genes within amplified regions were 
found more highly (>2-fold) expressed, when compared with 
normal colonic epithelium. This report differs substantially from 
our finding that 62% of highly amplified genes in breast cancer 
exhibit at least 2-fold increased expression. These contrasting 
findings may reflect methodological differences between the 
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the genomic distribution of expressed genes, even within existing 
microarray gene expression data sets, may permit me inference 
of DNA copy number aberration, particularly aneuploidy (where 
gene expression can be averaged across large chromosomal 
regions; see Fig. 3 and supporting information). Fifth, this 
finding implies that a substantial portion of the phenotypic 
uniqueness (and by extension, the heterogeneity in clinical 
behavior) among patients' tumors may be traceable to underly- 
ing variation in DNA copy number. Sixth, this finding supports 
a possible role for widespread DNA copy number alteration in 
tumorigenesis (17, 18), beyond the amplification of specific 
oncogenes and deletion of specific tumor suppressor genes. 
Widespread DNA copy number alteration, and the concomitant 
widespread imbalance in gene expression, might disrupt critical 
stochioraetric relationships in cell metabolism and physiology 
(e.g., proteosome, mitotic spindle), possibly promoting further 
chromosomal instability and directly contributing to tumor 
development or progression. Finally, our findings suggest the 
possibility of cancer therapies that exploit specific or global 
imbalances in gene expression in cancer. 

We thank the many members of the P.O.B. and D.B. labs for helpful 
discussions. J.R.P. was a Howard Hughes Medical Institute Physician 
Postdoctoral Fellow during a portion of this work. P.O.B. is a Howard 
Hughes Medical Institute Associate Investigator. This work was 
supported by grants from the National Institutes of Health, the Howard 
Hughes Medical Institute, the Norwegian Cancer Society, and the 
Norwegian Research Council. 

T., Eisen, M. B. van de Rijn, M., Jeffrey. S. S., et of.' (2001)^ Natl. Acad. 
ScL USA 98, 10869-10874. 
10. Schutcr. G. D. (1997) J. Mot. Med, 75, 694-698. 

11 Lander E S.. Linton, L. M., Bitten, B., Nusbaum, C, Zody, M. C, Baldwin, 
j7d2£ £ r^r, Doyle, M. FltzHugh, W., et aL (2001) Nature 
(London) 409. 860-921. 

12. F^M. S., Godfrey. T„ Chen, C, Waldman. F. & Gray, J. W. (1998) Gena 
Chromosomes Cancer 22, 105-113- 

13. Hughe., T. R., Robert*, C J-. Dai. H.. Jones, A. R., Meyer, M. R. Sladc, D 
Burchard, J., Dow, S., Ward, T. R., Kidd, M. J. el at (2000) Nat. Genet. 25. 
333-337 

14 Phillips. J. L., Hayward, S. W.. Wang, Y., Vassclli, J.. Pavlovich, C Padilla- 
' NashrH., FezuHo, J. Ghadiroi, B. M., Grossfcld, G. D. ( Rivera, A., et at. 

(2001) Cancer Res. 61, 3143-8149. . . M tf A 

15 Platecr. P., Upendcr. M. Wilson, K. t Willis, J., Lutlcrbaugh. J.. Nosnb, A., 
' WHUon. J. Mack, D., Ried, T. & Markowitz, S. (2002) Cancer Res. 62, 

16 Albt rtson,D. G„ YUtra, B., Segraves, R., Collins, C, Dairkee. S. H M Kowbcl, 
D. Kuo, W. L Gray, J. W. & Pinkel, D. (2000) Nat. Genet. 25, 
144—146. 

17 U R„ Ycreanian, G-, Ducsbcrg, P., Kraemer, A., WiUer, A-, Rausch, C. & 
' Hchimann, R. (1997) Proc Natl Acad. ScL USA 94, 14506-14511. 

18. Rasnick, D. & Ducsbcrg, P. H. (1999) Biochenv J. 340, 621-630. 



Pollack etei 

12968 | www.pnas.org/cgi/doi/10.1073/pnas.l62471999 



■Ill TECHNICAL UPDATE 

FROM YOUR LABORATORY SERVICES PROVIDER 

HER-2/neu Breast Cancer Predictive Testing 

Julie Sanford Hanna, Ph.D. and Dan Mornin, M.D. 



Each year, over 182,000 women in the United States are 
diagnosed with breast cancer, and approximately 45,000 die 
of the disease. 1 Incidence appears to be increasing in the 
United States at a rate of roughly 2% per year. The reasons 
for the increase are unclear, but non-genetic risk factors appear 
to play a large role. 2 

Five-year survival rates range from approximately 65%- 
85%, depending on demographic group, with a significant 
percentage of women experiencing recurrence of their cancer 
within 10 years of diagnosis. One of the factors most predic- 
tive for recurrence once a diagnosis of breast cancer has been 
made is the number of axillary lymph nodes to which tumor 
has metastasized. Most node-positive women are given adju- 
vant therapy, which increases their survival. However, 20%- 
30% of patients without axillary node involvement also 
develop recurrent disease, and the difficulty lies in how to iden- 
tify this high-risk subset of patients. These patients could 
benefit from increased surveillance, early intervention, and 
treatment. 

Prognostic markers currently used in breast cancer recur- 
rence prediction include tumor size, histological grade, steroid 
hormone receptor status, DNA ploidy, proliferative index, and 
cathepsin D status. Expression of growth factor receptors and 
over-expression of the HER-2/neu oncogene have also been 
identified as having value regarding treatment regimen and 
prognosis. 

HER-2/neu (also known as c-erbB2) is an oncogene that 
encodes a transmembrane glycoprotein that is homologous 
to, but distinct from, the epidermal growth factor receptor. 
Numerous studies have indicated that high levels of expres- 
sion of this protein are associated with rapid tumor growth, 
certain forms of therapy resistance, and shorter disease- free 
survival. The gene has been shown to be amplified and/or 
overexpressed in 10%-30% of invasive breast cancers and in 
40%-60% of intraductal breast carcinoma. 3 

There are two distinct FDA-approved methods by which 
HER-2/neu status can be evaluated: immunohistochemistry 
(IHC, HercepTest™) and FISH (fluorescent in situ hybridiza- 
tion, PathVysion™ Kit). Both methods can be performed on 
archived and current specimens. The first method allows visual 
assessment of the amount of HER-2/neu protein present on 
the cell membrane. The latter method allows direct quantifi- 
cation of the level of gene amplification present in the tumor, 
enabling differentiation between low- versus high-amplifica- 
ticm. At least one study has demonstrated a difference in 



recurrence risk in women younger than 40 years of age for 
low- versus high-amplified tumors (54.5% compared to 
85.7%); this is compared to a recurrence rate of 16.7% for 
patients with no HER-2/neu gene amplification. 4 HER-2/neu 
status may be particularly important to establish in women with 
small (< 1 cm) tumor size. 

The choice of methodology for determination of HER-2/ 
neu status depends in part on the clinical setting. FDA approval 
for the Vysis FISH test was granted based on clinical trials 
involving 1549 node-positive patients. Patients received one 
of three different treatments consisting of different doses of 
cyclophosphamide, Adriamycin, and 5-fluorouracil (CAF). 
The study showed that patients with amplified HER-2/neu 
benefited from treatment with higher doses of adriamycin- 
based therapy, while those with normal HER-2/neu levels did 
not. The study therefore identified a sub-set of women, who 
because they did not benefit from more aggressive treatment, 
did not need to be exposed to the associated side effects. In 
addition, other evidence indicates that HER-2/neu amplifica- 
tion in node-negative patients can be used as an independent 
prognostic indicator for early recurrence, recurrent disease at 
any time and disease-related death. 5 Demonstration of HER- 
2/neu gene amplification by FISH has also been shown to be 
of value in predicting response to chemotherapy in stage-2 
breast cancer patients. 

Selection of patients for Herceptin® (Trastuzumab) mono- 
clonal antibody therapy, however, is based upon demonstra- 
tion of HER-2/neu protein overexpression using HercepTest™. 
Studies using Herceptin® in patients with metastatic breast 
cancer show an increase in time to disease progression, 
increased response rate to chemotherapeutic agents and a small 
increase in overall survival rate. The FISH assays have not yet 
been approved for this purpose, and studies looking at response 
to Herceptin c in patients with or without gene amplification 
status determined by FISH are in progress. 

In general, FISH and IHC results correlate well. However, 
subsets of tumors are found which show discordant results; 
i.e., protein overexpression without gene amplification or lack 
of protein overexpression with gene amplification. The clini- 
cal significance of such results is unclear. Based on the above 
considerations, HER-2/neu testing at SHMC/PAML will uti- 
lize immunohistochemistry (HercepTest 0 ) as a screen, fol- 
lowed by FISH in IHC-negative cases. Alternatively, either 
method may be ordered individually depending on the clini- 
cal setting or clinician preference. 



AUGUST 1939 



CPT code information 



References 



HER-2/neu via IHC 

88342 (including interpretive report) 

HER-2/neu via FISH 

88271 *2 Molecular cytogenetics, DNA probe, each 
88274 Molecular cytogenetics, interphase in situ hybrid- 
ization, analyze 25-99 cells 
8829 1 Cytogenetics and molecular cytogenetics, interpre- 
tation and report 

Procedural Information 

Immunohistochemistry is performed using the FDA-approved 
DAKO antibody kit, Herceptest®. The DAKO kit contains 
reagents required to complete a two-step immunohisto- 
chemical staining procedure for routinely processed, paraffin- 
embedded specimens. Following incubation with the primary 
rabbit antibody to human HER-2/neu protein, the kit employs 
a ready-to-use dextran-based visualization reagent. This re- 
agent consists of both secondary goat anti-rabbit antibody 
molecules with horseradish peroxidase molecules linked to a 
common dextran polymer backbone, thus eliminating the need 
for sequential application of link antibody and peroxidase 
conjugated antibody. Enzymatic conversion ef the subse- 
quently added chromogen results in formation of visible 
reaction product at the antigen site. The specimen is then coun- 
terstained; a pathologist using light-microscopy interprets 
results. 

FISH analysis at SHMC/PAML is performed using the 
FDA-approved Path Vysion™ HER-2/neu DNA probe kit, pro- 
duced by Vysis, Inc. Formalin fixed, paraffin-embedded breast 
tissue is processed using routine histological methods, and then 
slides are treated to allow hybridization of DNA probes to the 
nuclei present in the tissue section. The Pathvysion™ kit con- 
tains two direct-labeled DNA probes, one specific for the 
alphoid repetitive DNA (CEP 1 7, spectrum orange) present at 
the chromosome 1 7 centromere and the second for the HER- 
2/neu oncogene located at 17ql 1.2-12 (spectrum green). Enu- 
meration of the probes allows a ratio of the number of copies 
of chromosome 17 to the number of copies of HER-2/neu to 
be obtained; this enables quantification of low versus high 
amplification levels, and allows an estimate of the percentage 
of cells with HER-2/neu gene amplification. The clinically 
relevant distinction is whether the gene amplification is due 
to increased gene copy number on the two chromosome 1 7 
homologues normally present or an increase in the number of 
chromosome 17s in the cells. In the majority of cases, ratio 
equivalents less than 2.0 are indicative of a normal/negative 
result, ratios of 2.1 and over indicate that amplification is 
present and to what degree. Interpretation of this data will be 
performed and reported from the Vysis-certified Cytogenet- 
ics laboratory at SHMC. 
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Aneuploidy and cancer 

Subrata Sen, PhD 



Numeric aberrations in chromosomes, referred to as aneu- 
ploidy, is commonly observed in human cancer. Whether aneu- 
ploidy is a cause or consequence of cancer has long been 
debated. Three lines of evidence now make a compelling case 
for aneuploidy being a discrete chromosome mutation event 
that contributes to malignant transformation and progression 
process. First, precise assay of chromosome aneuploidy in 
several primary tumors with in situ hybridization and compara- 
tive genomic hybridization techniques have revealed that 
specific chromosome aneusomies correlate with distinct tumor 
phenotypes. Second, aneuploid tumor cell lines and in vitro 
transformed rodent cells have been reported to display an 
elevated rate of chromosome instability, thereby indicating that 
aneuploidy is a dynamic chromosome mutation event associ- 
ated with transformation of cells. Third, and most important, a 
number of mitotic genes regulating chromosome segregation 
have been found mutated in human cancer cells, implicating 
such mutations in induction of aneuploidy in tumors. Some of 
these gene mutations, possibly allowing unequal segregations 
of chromosomes, also cause tumorigenic transformation of 
cells in vitro. In this review, the recent publications investigat- 
ing aneuploidy in human cancers, rate of chromosome instabil- 
ity in aneuploidy tumor cells, and genes implicated in regulat- 
ing chromosome segregation found mutated in cancer cells 

are discussed. Curr Opin Oncol 2000, 12:82-88 © 2000 Lippincott WHiiams 
& Wilkins, Inc. 
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Cancer research over the past decade has firmly estab- 
lished that malignant cells accumulate a large number of 
genetic mutations that affect differentiation, prolifera- 
tion, and cell death processes. In addition, it is also 
recognized that most cancers are clonal, although they 
display extensive heterogeneity with respect to kary- 
otypes and phenotypes of individual clonal populations. 
It is estimated that numeric chromosomal imbalance, 
referred to as aneuploidy \ is the most prevalent genetic 
change recorded among over 20,000 solid tumors 
analyzed thus far [1], Phenotypic diversity of the clonal 
populations in individual tumors involve differences in 
morphology, proliferative properties, antigen expression, 
drug sensitivity, and metastatic potentials. It has been 
proposed that an underlying acquired genetic instability 
is responsible for the multiple mutations detected in 
cancer cells that lead to tumor heterogeneity and 
progression [2]. In a somewhat contradictory argument, 
it has also been suggested that clonal expansion due to 
selection of cells undergoing normal rates of mutation 
can explain malignant transformation and progression 
process in humans [3]. Acquired genetic instability, 
nonetheless, is considered important for more rapid 
progression of the disease [4»»]. Although the original 
hypothesis on genetic instability in cancer primarily 
focused on chromosome imbalances in the form of aneu- 
ploidy in tumor cells, the actual relevance of such muta- 
tions in cancer remains a controversial issue. 

Whether or not aneuploidy contributes to the malignant 
transformation and progression process has long been 
debated. A prevalent idea on genetics of cancer referred 
to as "somatic gene mutation hypothesis" contends that 
gene mutations at the nucleotide level alone can cause 
cancer by either activating cellular proto-oncogenes to 
dominant cancer causing oncogenes and/or by inactivat- 
ing growth inhibitory tumor suppressor genes. In this 
scheme of things chromosomal instability in the form of 
aneuploidy is a mere consequence rather than a cause of 
malignant transformation and progression process. 

In this review, some of the recent observations on the 
subject are discussed and compelling evidence is 
provided to suggest that aneuploidy is a distinct form of 
genetic instability in cancer that frequently correlates 
with specific phenotypes and stages of the disease. 
Furthermore, discrete genetic targets affecting chromo- 
somal stability in cancer cells, recently identified, are 
also discussed. These data provide a new direction 
coward elucidating the molecular mechanisms responsi- 
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ble for induction of aneuploidy in cancer and may even- 
tually be exploited as novel therapeutic targets in the 
future. 

Genetic alterations in cancer 

Alterations in many genetic loci regulating growth, 
senescence, and apoptosis, identified in tumor cells, 
have led to the current understanding of cancer as a 
genetic disease. The genetic changes identified in 
tumors include: subtle mutations in genes at the 
nucleotide level; chromosomal translocations leading to 
structural rearrangements in genes; and numeric 
changes in either partial segments of chromosomes or 
whole chromosomes (aneuploidy) causing imbalance in 
gene dosage. 

For the purpose of this review, both segmental and whole 
chromosome imbalances leading to altered DNA dosage 
in cancer cells are included as examples of aneuploidy. 

Incidence of aneuploidy in cancer 

Evidence of aneuploidy involving one or more chromo- 
somes have been commonly reported in human tumors. 
Although these observations were initially made using 
classic cytogenetic techniques late in a tumor's evolu- 
tion and were difficult to correlate with cancer progres- 
sion, more recent studies have reported association of 
specific nonrandom chromosome aneuploidy with 
different biologic properties such as loss of hormone 
dependence and metastatic. potential f5]. 

Classic cytogenetic studies performed on tumor cells 
had serious limitations in scope because they were 
applicable only to those cases in which mitotic chromo- 
somes could be obtained. Because of low spontaneous 
rates of cell division in primary tumors, analyses 
depended on cells either derived selectively from 
advanced metastases or those grown in vitro for variable 
periods of time. In both instances, metaphases analyzed 
represented only a subset of primary tumor cell popula- 
tion. Two major advances in cytogenetic analytic tech- 
niques, in situ hybridization (ISH) and comparative 
genomic hybridization (CGH), have allowed better reso- 
lution of chromosomal aberrations in freshly isolated 
tumor cells [61 . ISH analyses with chromosome-specific 
DNA probes, a powerful adjunct to metaphasic analysis, 
allows assessment of chromosomal anomalies within 
tumor cell populations in the contexts of whole nuclear 
architecture and tissue organization. CGH allows 
genome wide screening of chromosomal anomalies 
without the use of specific probes even in the absence 
of prior knowledge of chromosomes involved. Although 
both techniques have certain limitations in terms of 
their resolution power, they nonetheless provide a 
better approximation of chromosomal changes occurring 
among tumors of various histology, grade, and stage 



compared with what was possible with the classic cyto- 
genetic techniques. Genomic ploidy measurements 
have also been performed at the DNA level with flow 
cytometry and cytofluorometric methods. Although 
these assays underestimate chromosome ploidy due to a 
chromosomal gain occasionally masking a chromosomal 
loss in the same cell, several studies using these 
methods have supported the conclusion that DNA 
aneuploidy closely associates with poor prognosis in 
various cancers [7,8]. This discussion of some recent 
examples published on aneuploidy in cancer includes 
discussion of studies dealing with DNA ploidy measure- 
ments as well. Most of these observations are correlative 
without direct proof of specific involvement of genes on 
the respective chromosomes. Identification of putative 
oncogenes and tumor suppressor genes on gained and 
lost chromosomes in aneuploid tumors, however, are 
providing strong evidence that chromosomes involved in 
aneuploidy play a critical role in the tumorigenic 
process. 

In renal tumors, either segmental or whole chromosome 
aneuploidy appears to be uniquely associated with 
specific histologic subtypes [9]. Tumors from patients 
with hereditary papillary renal carcinomas (HPRC) 
commonly show trisomy of chromosome 7, when 
analyzed by CGH. Germline mutations of a putative 
oncogene MET have been detected in patients with 
HPRC. A recent study [10] has demonstrated that an 
extra copy of chromosome 7 results in nonrandom dupli- 
cation of the mutant MET allele in HPRC, thereby 
implicating this trisomy in tumorigenesis. The study 
suggested that mutation of MET may render the cells 
more susceptible to errors in chromosome replication, 
and that clonal expansion of cells harboring duplicated 
chromosome 7 reflects their proliferative advantage. In 
addition to chromosome 7, trisomy of chromosome 17 in 
papillary tumors and also of chromosome 8 in mesoblas- 
tic nephroma are commonly seen. Association of specific 
chromosome imbalances with benign and malignant 
forms of papillary renal tumors, therefore, not only 
contribute to an understanding of tumor origins and 
evolution, but also implicate aneuploidy of the respec- 
tive chromosomes in the tumorigenic transformation 
process. 

In colorectal tumors, chromosome aneuploidy is a 
common occurrence. In fact, molecular allelotyping 
studies have suggested that limited karyotyping data 
available from these tumors actually underestimate the 
true extent of these changes. Losses of heterozygosity 
reflecting loss of the maternal or paternal allele in 
tumors are widespread and often accompanied by a gain 
of the opposite allele. Therefore, for example, a tumor 
could lose a maternal chromosome while duplicating 
the same paternal chromosome, leaving the tumor cell 
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with a normal karyotype and ploidy but an aberrant 
allelotype. It has been estimated chat cancer of the 
colon, breast, pancreas, or prostate may lose an average 
of 25% of its alleles. It is not unusual to discover that a 
tumor has lost over half of its alleles [4]. In clinical 
settings, DNA ploidy measurements have revealed that 
DNA aneuploidy indicates high risk of developing 
severe premalignant changes in patients with ulcerative 
colitis, who are known to have an increased risk of 
developing colorectal cancer (11], DNA aneuploidy has 
been found to be one of the useful indicators of lymph 
node metastasis in patients with gastric carcinoma and 
associated with poor outcome compared with diploid 
cases [12,13J. CGH analyses of chromosome aneu- 
ploidy, on the other hand, was reported to correlate gain 
of chromosome 20q with high tumor S phase fractions 
and loss of 4q with low tumor apoptotic indices [14]. 
Aneuploidy of chromosome 4 in metastatic colorectal 
cancer has recently been confirmed in studies that used 
unbiased DNA fingerprinting with arbitrarily primed 
polymerase chain reactions to detect moderate gains 
and losses of specific chromosomal DNA sequences 
[15]. The molecular karyotype (amplotype) generated 
from colorectal cancer revealed that moderate gains of 
sequences from chromosomes 8 and 13 occurred in 
most tumors, suggesting that overrepresentation of 
these chromosomal regions is a critical step for metasta- 
tic colorectal cancer. 

In addition to being implicated in tumorigenesis and 
correlated with distinct tumor phenotypes, chromosome 
aneuploidy has been used as a marker of risk assessment 
and prognosis in several other cancers. The potential 
value of aneuploidy as a noninvasive tool to identify 
individuals at high risk of developing head and neck 
cancer appears especially promising. Interphase fluores- 
cence in situ hybridization (FISH) revealed extensive 
aneuploidy in tumors from patients with head and neck 
squamous cell carcinomas (HNSGC) and also in clini- 
cally normal distant oral regions from the same individu- 
als [16,17]. It has been proposed that a panel of chromo- 
some probes in FISH analyses may serve as an 
important tool to detect subclinical tumorigenesis and 
for diagnosis of residual disease. The presence of aneu- 
ploid or tetraploid populations is seen in 90% to 95% of 
esophageal adenocarcinomas, and when seen in 
conjunction with Barrett's esophagus, a premalignant 
condition, predicts progression of disease [18,19]. 
Chromosome ploidy analyses in conjunction with loss of 
heterozygosity and gene mutation studies in Barrett's 
esophagus reflect evolution of neoplastic cell lineages in 
vivo [20]. Evolution of neoplastic progeny from Barrett's 
esophagus following somatic genetic mutations 
frequently involves bifurcations and loss of heterozygos- 
ity at several chromosomal loci leading to aneuploidy 
and cancer. Accordingly, it is hypothesized that during 



tumor cell evolution diploid cell progenitors with 
somatic genetic abnormalities undergo expansion with 
acquired genetic instability. Such instability, often 
manifested in the form of increased incidence of aneu- 
ploidy, enters a phase of clonal, evolution beginning in 
premalignant cells that proceeds over a period of time 
and occasionally leads to malignant transformation. The 
clonal evolution continues even after the emergence of 
cancer. 

The significance of DNA and chromosome aneu- 
ploidy in other human cancers continue to be evalu- 
ated. Among papillary thyroid carcinomas, aneuploid 
DNA content in tumor cells was reported to correlate 
with distant metastases, reflecting worsened progno- 
sis [21]. Genome wide screening of follicular thyroid 
tumors by CGH, on the other hand, revealed frequent 
loss of chromosome 22 in widely invasive follicular 
carcinomas [22]. Chromosome copy number gains in 
invasive neoplasm compared with foci of ductal carci- 
noma in situ (DCIS) with similar histology have been 
proposed to indicate involvement of aneuploidy in 
progression of human breast cancer [23]. ISH analyses 
of cervical intraepithelial neoplasia has provided 
suggestive evidence that chromosomes 1, 7 and X 
aneusomy is associated with progression toward cervi- 
cal carcinoma [24]. 

Although the prognostic value of numeric aberrations 
remains a matter of debate in human hematopoietic 
neoplasia, there have been recent studies to suggest that 
the presence of monosomy 7 defines a distinct subgroup 
of acute myeloid leukemia patients (25]. It is interesting 
in this context that therapy-related myelodysplastic 
syndromes have been reported to display monosomy 5 
and 7 karyotypes, reflecting poor prognosis [26]. 

The clinical observations, mentioned previously, are 
supported by in vitro studies in human and rodent cells in 
which aneuploidy is induced at early stages of transforma- 
tion [27,28]. It is even suggested that aneuploidy may 
cause cell immortalization, in some instances, that is a 
critical step preceeding transformation. 

Finally, in an interesting study to develop transgenic 
mouse models of human chromosomal diseases, chromo- 
some segment specific duplication and deletions of the 
genome were reported to be constructed in mouse 
embryonic stem cells [29]. Three duplications for a 
portion of mouse chromosome 1 1 syntenic with human 
chromosome 17 were established in the mouse 
germline. Mice with 1Mb duplication developed corneal 
hyperplasia and thymic tumors. The findings represent 
the first transgenic mouse model of aneuploidy of a 
defined chromosome segment that documents the direct 
role of chromosome aneusomy in tumorigenesis. 



Aneuploidy and cancer Sen 85 



Aneuploidy as "dynamic cancer-causing 
mutation" instead of a "consequential state" 
in cancer 

According to the hypothesis previously discussed, aneu- 
ploidy represents either a "gain of function" or "loss of 
function" mutation at the chromosome level with a 
causative influence on the tumorigenesis process. The 
hypothesis, however, is based only on circumstantial 
evidence even though existence of aneuploidy is corre- 
lated with different tumor phenotypes. The existence of 
numeric chromosomal alterations in a tumor does not 
mean that the change arose as a dynamic mutation due 
to genomic instability, because several factors could lead 
to consequential aneuploidy in tumors, also. Although 
aneuploidy as a dynamic mutation due to genomic insta- 
bility in tumor cells would occur at a certain measurable 
rate per cell generation, a consequential state of aneu- 
ploidy in tumors may not occur at a predictable rate 
under similar conditions or in tumors with similar 
phenotypes. In addition to genomic instability, differ- 
ences in environmental factors with selective pressure, 
could explain high incidence of aneuploidy and other 
somatic mutations in tumors compared with normal cells 
[4]. These include humoral, cell substratum, and cell- 
cell interaction differences between tumor and normal 
cell environments. It could be argued that despite 
similar rates of spontaneous aneuploidy induction in 
normal and tumor cells, the latter are selected to prolif- 
erate due to altered selective pressure in the tumor cell 
environment, whereas the normal cells are eliminated 
through activation of apoptosis. Alternatively, of course, 
one could postulate that selective expression or overex- 
pression of anti-apoptotic proteins or inactivation of 
proapoptotic proteins in tumor cells may counteract 
default induction of apoptosis in G2/M phase cells 
undergoing missegregation of chromosomes. Recent 
demonstration of overexpression of a G2/M phase anti- 
apoptotic protein survivin in cancer cells [30] suggests 
that this protein may favor aberrant progression of aneu- 
ploid transformed cells through mitosis. This would 
then lead to proliferation of aneuploid cell lineages, 
which may undergo clonal evolution. 

To ascertain that aneuploidy is a dynamic mutational 
event, various human tumor eel! lines and transformed 
rodent cell lines have been analyzed for the rate of 
aneuploidy induction. When grown under controlled in 
vitro conditions, such conditions ensure that environ- 
mental factors do not influence selective proliferation of 
cells with chromosome instability. In one study, 
Lengauer et ai [31 •] provided unequivocal evidence by 
FrSH analyses that losses or gains of multiple chromo- 
somes occurred in excess of 10' 2 per chromosome per 
generation in aneuploid colorectal cancer cell lines. The 
study further concluded that such chromosomal instabil- 
ity appeared to be a dominant trait. Using another in 



vitro model system of Chinese hamster embryo (CHE) 
cells, Duesberg et a/, [32»] have also obtained similar 
results. With clonal cultures of CHE cells, transformed 
with nongenotoxic chemicals and a mitotic inhibitor, 
these authors demonstrated that the overwhelming 
majority of the transformed colonies contained more 
than 50% aneuploid cells, indicating that aneuploidy 
would have originated from the same cells that under- 
went transformation. All the transformed colonies tested 
were tumorigenic. It was further documented that the 
ploidy factor representing the quotient of the modal 
chromosome number divided by the normal diploid 
number, in each clone, correlated directly with the 
degree of chromosomal instability. Therefore, chromo- 
somal instability was found proportional to the degree of 
aneuploidy in the transformed cells and the authors 
hypothesized that aneuploidy is a unique mechanism of 
simultaneously altering and destabilizing, in a massive 
manner, the normal cellular phenotypes. In the absence 
9 of any evidence that the transforming chemicals used in 
the study did not induce other somatic mutations, it is 
difficult to rule out the contribution of such mutations 
in the transformation process. These results nonetheless 
make a strong case for aneuploidy being a dynamic chro- 
mosome mutation event intimately associated with 
cancer. 

Aneuploidy versus somatic gene mutation in 
cancer 

The idea that numeric chromosome imbalance or aneu- 
ploidy is a direct cause of cancer was proposed at the 
turn of the century by Theodore Boveri [33]. However, 
the hypothesis was largely ignored over the last several 
decades in favor of the somatic gene mutation hypothe- 
sis, mentioned earlier. Evidence accumulating in the 
literature lately on specific chromosome aneusomies 
recognized in primary tumors, incidence of aneuploidy 
in cells undergoing transformation, and aneuploid tumor 
cells showing a high rate of chromosome instability have 
led to the rejuvenation of Boveri's hypothesis. The 
concept has recently been discussed as a "vintage wine 
in a new bottle" [34 # ], The author points out that 
except for rare cancers caused by dominant retroviral 
oncogenes, diploidy does not seem to occur in solid 
tumors, whereas aneuploidy is a rule rather than excep- 
tion in cancer. 

Aneuploidy as an effective mutagenic mechanism 
driving tumor progression, on the other hand, is being 
recognized as a viable solution to the paradox that with 
known mutation rate in non-germiine cells (~10~ 7 per 
gene per cell generation) tumor cell lineages cannot 
accumulate enough mutant genes during a human life- 
time [35]. The concept is gaining significant credibility 
since genes that potentially affect chromosome segrega- 
tion were found mutated in human cancer. Some of 
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these genes have also been shown to have transforming 
capability in in vitro assays. Selected recent publications 
describing the findings are being discussed below in 
reference to the mitotic targets potentially involved in 
inducing chromosome segregation anomalies in cells. 

Potential mitotic targets and molecular 
mechanisms of aneuploidy 

Because aneuploidy represents numeric imbalance in 
chromosomes, it is reasonable to expect that aneuploidy 
arises due to missegregation of chromosomes during cell 
division. There are many potential mitotic targets, 
which could cause unequal segregation of chromosomes 
(Fig. 1). Recent investigations have identified several 
genes involved in regulating these mitotic targets and 
mitotic checkpoint functions, which can be implicated 
in induction of aneuploidy in tumor cells. This discus- 
sion is restricted to those mitotic targets and checkpoint 
genes whose abnormal functioning has been observed in 
cancer or has been shown to cause tumorigenic transfor- 
mation of cells, in recent years. The role of telomeres is 
discussed elsewhere in this issue. For a more detailed 
description of the components of mitotic machinery and 
their possible involvement in causing chromosome 
segregation abnormalities in tumor cells, readers may 
refer to a recently published review [36*]. 

Among the mitotic targets implicated in cancer, centro- 
some defects have been observed in a wide variety of 
malignant human tumors. Centrosomes play a central role 
in organizing the microtubule network in interphase cells 
and mitotic spindle during cell division. Multipolar 
mitotic spindles have been observed in human cancers in 
situ and abnormalities in the form of supernumerary 



Figure 1. Potential mitotic targets causing aneuploidy in 
oncogenesis 
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Diagram illustrates that defects in several processes involving chromosomal, 
spindle microtubule, and centrosomal targets, in addition to abnormal cytokine- 
sis, may cause unequal partitioning of chromosomes during mitosis, leading lo 
aneuploidy. Recently obtained evidence in favor of some of these possibilities is 
discussed in the text. 



centrosomes, centrosomes of aberrant size and shape as 
well as aberrant phosphorylation of centrosome proteins 
have been reported in prostate, colon, brain, and breast 
tumors [37,38]. In view of the findings that abnormal 
centrosomes retain the ability to nucleate microtubules in 
vitro y it is conceivable that cells with abnormal centro- 
somes may missegregate chromosomes producing aneu- 
ploid cells. The molecular and genetic bases of abnormal 
centrosome generation and the precise pathway through 
which they regulate the chromosome segregation process 
remain to be elucidated. Recent discovery of a centro- 
some-associated kinase STK15/BTAK/aurora2, naturally 
amplified and overexpresscd in human cancers, has raised 
the interesting possibility that aberrant expression of this 
kinase is critically involved in abnormal centrosome func- 
tion and unequal chromosome segregation in tumor cells 
[39,40]. Exogenous expression of the kinase in rodent and 
human cells was found to correlate with an abnormal 
number of centrosomes, unequal partitioning of chromo- 
somes during division, and tumorigenic transformation of 
cells. It is relevant in this context to mention that the 
Xenopus homologue of human STK15/BTAK/aurora2 
kinase has recently been shown to phosphorylate a micro- 
tubule motor protein XJEgS, the human orthologue of 
which is known to participate in the centrosome separa- 
tion during mitosis [41]. Findings on STK15/aurora2 
kinase, thus, provide an interesting lead to a possible 
molecular mechanism of centrosome's role in. oncogene- 
sis. Centrosomes have, of late, been implicated in onco- 
genesis from studies revealing supernumerary centro- 
somes in j&5J-deficient fibroblasts and overexpression of 
another centrosome kinase PLK1 being detected in 
human non-small cell lung cancer [42]. 

One of the critical events that ensures equal partition- 
ing of the chromosomes during mitosis is the proper 
and timely separation of sister chromatids that are 
attached to each other and to the mitotic spindle. 
Untimely separation of sister chromatids has been 
suspected as a cause of aneuploidy in human tumors. 
Cohesion between sister chromatids is established 
during replication of chromosomes and is retained until 
the next metaphase/anaphase transition. It has been 
shown that during metaphase-anaphase transition, the 
anaphase promoting complex/cyclosome triggers the 
degradation of a group of proteins called securins that 
inhibit sister chromatid separation. A vertebrate securin 
(v-securin) has recently been identified that inhibits 
sister chromatid separation and is involved in transfor- 
mation and tumorigenesis. Subsequent analysis 
revealed that the human securin is identical to the 
product of the gene called pituitary tumor transforming 
gene, which is overexpresscd in some tumors and 
exhibits transforming activity in NIH3T3 cells. It is 
proposed that elevated expression of the v-securin may 
contribute to generation of malignant tumors due to 




chromosome gain or loss produced by errors in chro- 
matid separation [43*]. 

Normal progression through mitosis during prophase to 
anaphase transition is monitored at least at two check- 
points. One checkpoint operates during early prophase 
at G2 to metaphase progression while the second 
ensures proper segregation of chromosomes during 
metaphase to anaphase transition. Several mitotic 
checkpoint genes responding to mitotic spindle defects 
have been identified in yeast. The metaphase-anaphase 
transition is delayed following activation of this check- 
point during which kinetochores remain unattached to 
the spindle. The signal is transmitted through a kineto- 
chore protein complex consisting of Mpslp and several 
Mad and Bub proteins [44]. It is expected that for 
unequal chromosome segregation to be perpetuated 
through cell proliferation cycles giving rise to aneu- 
ploidy, checkpoint controls have to be abrogated. 

Following this logic, Vogelstein et aL [45»] hypothesized 
that aneuploid tumors would reveal mutation in mitotic 
spindle checkpoint genes. Subsequent studies by these 
investigators have proven the validity of this hypothesis 
and a small fraction of human, colorectal cancers have 
revealed the presence of mutations in either hBubl or 
hBubRl checkpoint genes. It was further revealed that 
mutant BUB1 could function in a dominant negative 
manner conferring an abnormal spindle checkpoint 
when expressed exogenously. Inactivation of spindle 
checkpoint function in viral ly induced leukemia has also 
recently been documented following the finding that 
hMADl checkpoint protein is targeted by the Tax 
protein of the human T-cell leukemia virus type 1. 
Abrogation of hMADl function leads to multinucleatton 
and aneuploidy [46], 

In addition to mitotic spindle checkpoint defects, failed 
DNA damage checkpoint function in yeast is frequently 
associated with aberrant chromosome segregation as 
well. It, therefore, appears intriguing yet relevant that 
the human BRCA1 gene, proposed to be involved in 
DNA damage checkpoint function, when mutated by a 
targeted deletion of exon 1 1 led to defective G2/M cell 
cycle checkpoint function and genetic instability in 
mouse embryonic Fibroblasts [47]. The cells revealed 
multiple functional centrosomes and unequal chromo- 
some segregation and aneuploidy. Although the molecu- 
lar basis for these abnormalities is not known at this 
time, it raises the interesting possiblilty that such an 
aneuploidy-driven mechanism may be involved in 
tumorigenesis in individuals carrying germline muta- 
tions of BRCA1 gene. 
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Conclusion 

Growing evidence from human tumor cytogenetic inves- 
tigations strongly suggest that aneuploidy is associated 
with the development of tumor phenotypes. Clinical 
findings of correlation between aneuploidy and tumori- 
genesis are supported by studies with in vitro grown 
transformed cell lines. Molecular genetic analyses of 
tumor cells provide credible evidence that mutations in 
genes controlling chromosome segregation during 
mitosis play a critical role in causing chromosome insta- 
bility leading to aneuploidy in cancer. Further elucida- 
tion of molecular and physiologic bases of chromosome 
instability and aneuploidy induction could lead to the 
development of new therapeutic approaches for 
common forms of cancer. 
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ABSTRACT Wnt family members are critical to many 
developmental processes, and components of the Wnt signal- 
ing pathway have been linked to tumorigenesis in familial and 
sporadic colon carcinomas. Here we report the identification 
of two genes, WISPS and WISPS, that are up-regulated in the 
mouse mammary epithelial cell line C57MG transformed by 
Wnt-1, but not by Wnt-4. Together with a third related gene, 
WISPS, these proteins define a subfamily of the connective 
tissue growth factor family. Two distinct systems demon- 
strated WISP induction to be associated with the expression of 
Wnt-1. These included (/) C57MG cells infected with a Wnt-1 
retroviral vector or expressing Wnt-1 under the control of a 
tetracyline repressible promoter, and (ii) Wnt-1 transgenic 
mice. The WISP-I gene was localized to human chromosome 
8q24.1-8q24.3. WISPS genomic DNA was amplified in colon 
cancer cell lines and in human- colon tumors and its RNA 
overexpressed (2- to > 30-fold) in 84% of the tumors examined 
compared with patient-matched normal mucosa. WISPS 
mapped to chromosome 6q22-6q23 and also was overex- 
pressed (4- to >40-fold) in 63% of the colon tumors analyzed. 
In contrast, WISPS mapped to human chromosome 20ql2- 
20ql3 and its DNA was amplified, but RNA expression was 
reduced (2- to > 30-fold) in 79% of the tumors. These results 
suggest that the WISP genes may be downstream of Wnt-1 
signaling and that aberrant levels of WISP expression in colon 
cancer may play a role in colon tumorigenesis. 

Wnt-1 is a member of an expanding family of cysteine-rich, 
glycosylated signaling proteins that mediate diverse develop- 
mental processes such as the control of cell proliferation, 
adhesion, cell polarity, and the establishment of cell fates (1, 
2). Wnt-1 originally was identified as an oncogene activated by 
the insertion of mouse mammary tumor virus in virus-induced 
mammary adenocarcinomas (3, 4). Although Wnt-1 is not 
expressed in the normal mammary gland, expression of Wnt-1 
in transgenic mice causes mammary tumors (5). 

In mammalian cells, Wnt family members initiate signaling 
by binding to the seven-transmembrane spanning Frizzled 
receptors and recruiting the cytoplasmic protein Dishevelled 
(Dsh) to the cell membrane (1, 2, 6). Dsh then inhibits the 
kinase activity of the normally constitutively active glycogen 
•synthase kinase-3/3 (GSK-3j8) resulting in an increase in 
/3-catenin levels. Stabilized /3-catenin interacts with the tran- 
scription factor TCF/Lefl, forming a complex that appears in 
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the nucleus and binds TCF/Lefl target DNA elements to 
activate transcription (7, 8). Other experiments suggest that 
the adenomatous polyposis coli (APC) tumor suppressor gene 
also plays an important role in Wnt signaling by regulating 
/3-catenin levels (9). APC is phosphorylated by GSK-3/3, binds 
to j3-catenin, and facilitates its degradation. Mutations in 
either APC or j3-catenin have been associated with colon 
carcinomas and melanomas, suggesting these mutations con- 
tribute to the development of these types of cancer, implicating 
the Wnt pathway in tumorigenesis (1). 

Although much has been learned about the Wnt signaling 
pathway over the past several years, only a few of the tran- 
scriptionally activated downstream components activated by 
Wnt have been characterized. Those that have been described 
cannot account for all of the diverse functions attributed to 
Wnt signaling. Among the candidate Wnt target genes are 
those encoding the nodal-related 3 gene, Xnr3, a member of 
the transforming growth factor (TGF)-/3 superfamily, and the 
homeobox genes, engrailed, goosecoid, twin (Xtwri), andsiamois 
(2). A recent report also identifies c-myc as a target gene of the 
Wnt signaling pathway (10). 

To identify additional downstream genes in the Wnt signal- 
ing pathway that are relevant to the transformed cell pheno- 
type, we used a PCR-based cDNA subtraction strategy, sup- 
pression subtractive hybridization (SSH) (11), using RNA 
isolated from C57MG mouse mammary epithelial cells and 
C57MG cells stably transformed by a Wnt-1 retrovirus. Over- 
expression of Wnt-1 in this cell line is sufficient to induce a 
partially transformed phenotype, characterized by elongated 
and refractile cells that lose contact inhibition and form a 
multilayered array (12, 13). We reasoned that genes differen- 
tially expressed between these two cell lines might contribute 
to the transformed phenotype. 

In this paper, we describe the cloning and characterization 
of two genes up-regulated in Wnt-1 transformed cells, WISPS 
and WISPS, and a third related gene, WISPS. The WISP genes 
are members of the CCN family of growth factors, which 
includes connective tissue growth factor (CTGF), Cyr61, and 
nov, a family not previously linked to Wnt signaling. 

MATERIALS AND METHODS 

SSH. SSH was performed by using the PCR-Select cDNA 
Subtraction Kit (CLONTECH). . Tester double-stranded 

Abbreviations: TGF, transforming growth factor; CTGF, connective 
tissue growth factor; SSH, suppression subtractive hybridization; 
VWC, von Willebrand factor type C module. 
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cDNA was synthesized from 2 jig of poly(A) + RNA isolated 
from the C57MG/Wnt-1 cell line and driver cDNA from 2 ptg 
of poly(A) + RNA from the parent C57MG cells. The sub- 
tracted cDNA library was subcloned into a pGEM-T vector for 
further analysis. 

cDNA Library Screening. Clones encoding full-length 
mouse WISP-1 were isolated by screening a AgtlO mouse 
embryo cDNA library (CLONTECH) with a 70-bp probe from 
the original partial clone 568 sequence corresponding to amino 
acids 128-169. Clones encoding full-length human WISP-1 
were isolated by screening AgtlO lung and fetal kidney cDNA 
libraries with the same probe at low stringency. Clones en- 
coding full-length mouse and human WISP-2 were isolated by 
screening a C57MG/Wnt-1 or human fetal lung cDNA library 
with a probe corresponding to nucleotides 1463-1512. Full- 
length cDNAs encoding WISPS were cloned from human 
bone marrow and fetal kidney libraries. 

Expression of Human WISP RNA. PCR amplification of 
first-strand cDNA was performed with human Multiple Tissue 
cDNA panels (CLONTECH) and 300 julM of each dNTP at 
94°C for 1 sec, 62°C for 30 sec, 72°C for 1 min, for 22-32 cycles. 
WISP and glyceraldehyde-3-phosphate dehydrogenase primer 
sequences are available on request. 

In Situ Hybridization. 33 P-labeled sense and antisense ribo- 
probes were transcribed from an 897-bp PCR product corre- 
sponding to nucleotides 601-1440 of mouse WISP-1 or a 
294-bp PCR product corresponding to nucleotides 82-375 of 
mouse WISP-2. All tissues were processed as described (40). 

Radiation Hybrid Mapping. Genomic DNA from each 
hybrid in the Stanford G3 and Genebridge4 Radiation Hybrid 
Panels (Research Genetics, Huntsville, AL) and human and 
hamster control DNAs were PCR-amplified, and the results 
were submitted to the Stanford or Massachusetts Institute of 
Technology web servers. 

Cell Lines, Tumors, and Mucosa Specimens. Tissue speci- 
mens were obtained from the Department of Pathology (Uni- 
versity of Pittsburgh) for patients undergoing colon resection 
and from the University of Leeds, United Kingdom. Genomic 
DNA was isolated (Qiagen) from the pooled blood of 10 
normal human donors, surgical specimens, and the following 
ATCC human cell lines: SW480, COLO 320DM, HT-29, 
WiDr, and SW403 (colon adenocarcinomas), SW620 (lymph 
node metastasis, colon adenocarcinoma), HCT 116 (colon 
carcinoma), SK-CO-1 (colon adenocarcinoma, ascites), and 
HM7 (a variant of ATCC colon adenocarcinoma cell line LS 
174T). DNA concentration was determined by using Hoechst 
dye 33258 intercalation f luorimetry. Total RNA was prepared 
by homogenization in 7 M GuSCN followed by centrifugation 
over CsCl cushions or prepared by using RNAzol. 

Gene Amplification and RNA Expression Analysis. Relative 
gene amplification and RNA expression of WISPs and c-myc in 
the cell lines, colorectal tumors, and normal mucosa were 
determined by quantitative PCR. Gene-specific primers and 
fluorogenic probes (sequences available on request) were 
designed and used to amplify and quantitate the genes. The 
relative gene copy number was derived by using the formula 
2 (Act > where ACt represents the difference in amplification 
cycles required to detect the WISP genes in peripheral blood 
lymphocyte DNA compared with colon tumor DNA or colon 
tumor RNA compared with normal mucosal RNA. The 
d-method was used for calculation of the SE of the gene copy 
number or RNA expression level. The WSP-specific signal was 
normalized to that of the glyceraldehyde-3-phosphate dehy- 
drogenase housekeeping gene. All TaqMan assay reagents 
were obtained from Perkin-Elmer Applied Biosystems. 

RESULTS 

Isolation of WISP-1 and WISP-2 by SSH. To identify Wnt- 
1-inducible genes, we used the technique of SSH using the 
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mouse mammary epithelial cell line C57MG and C57MG cells 
that stably express Wnt-1 (11). Candidate differentially ex- 
pressed cDNAs (1,384 total) were sequenced. Thirty-nine 
percent of the sequences matched known genes or homo- 
logues, 32% matched expressed sequence tags, and 29% had 
no match. To confirm that the transcript was differentially 
expressed, semiquantitative reverse transcription-PCR and 
Northern analysis were performed by using mRNA from the 
C57MG and C57MG/Wnt-1 cells. 

Two of the cDNAs, WISP-1 and WISP-2, were differentially 
expressed, being induced in the C57MG/Wnt-1 cell line, but 
not in the parent C57MG cells or C57MG cells overexpressing 
Wnt-4 (Fig. I A and B), Wnt-4, unlike Wnt-1, does not induce 
the morphological transformation of C57MG cells and has no 
effect on /3-catenin levels (13, 14). Expression of WISP-1 was 
up-regulated approximately 3-fold in the C57MG/Wnt-1 cell 
line and WISP-2 by approximately 5-fold by both Northern 
analysis and reverse transcription-PCR. 

An independent, but similar, system was used to examine 
WISP expression after Wnt-1 induction. C57MG cells express- 
ing the Wnt-1 gene under the control of a tetracycline- 
repressible promoter produce low amounts of Wnt-1 in the 
repressed state but show a strong induction of Wnt-1 mRNA 
and protein within 24 hr after tetracycline removal (8). The 
levels of Wnt-1 and WISP RNA isolated from these cells at 
various times after tetracycline removal were assessed by 
quantitative PCR. Strong induction of Wnt-1 mRNA was seen 
as early as 10 hr after tetracycline removal. Induction of WISP 
mRNA (2- to 6-fold) was seen at 48 and 72 hr (data not shown). 
These data support our previous observations that show that 
WISP induction is correlated with Wnt-1 expression. Because 
the induction is slow, occurring after approximately 48 hr, the 
induction of WISPs may be an indirect response to Wnt-1 
signaling. 

cDNA clones of human WISP-1 were isolated and the 
sequence compared with mouse WISP-1. The cDNA sequences 
of mouse and human WISP-1 were 1,766 and 2,830 bp in length, 
respectively, and encode proteins of 367 aa, with predicted 
relative molecular masses of ^40,000 (M T 40 K). Both have 
hydrophobic N-terminal signal sequences, 38 conserved cys- 
teine residues, and four potential N-linked glycosylation sites 
and are 84% identical (Fig. 24). 

Full-length cDNA clones of mouse and human WISP-2 were 
1,734 and 1,293 bp in length, respectively, and encode proteins 
of 251 and 250 aa, respectively, with predicted relative molec- 
ular masses of ^27,000 (M r 27 K) (Fig. 2B). Mouse and human 
WISP-2 are 73% identical. Human WISP-2 has no potential 
N-linked glycosylation sites, and mouse WISP-2 has one at 

CS7MG 



Parent Wnt-1 Wnt-4 ' 




Fig. 1. WISP-I and WISP-2 are induced by Wnt-1, but not Wnt-4, 
expression in C57MG cells. Northern analysis of WISP-I (A) and 
WISP-2 (B) expression in C57MG, C57MG/Wnt-1, and C57MG/ 
Wnt-4 cells. PoIy(A) + RNA (2 /ig) was subjected to Northern blot 
analysis and hybridized with a 70-bp mouse WISP- /-specific probe 
(amino acids 278-300) or a 190-bp W75P-2-specific probe (nucleotides 
1438-1627) in the 3' untranslated region. Blots were rehybridized with 
human 0-actin probe. 



Cell Biology, Medical Sciences: Pennica et al. 



Proc. Natl. Acad. Set. USA 95 (1998) 14719 



I ■ - — lG I-'-DP D uftMun — 

.1 ■ x> a : r; v 'i: \ v ■: v <; v :. :: iiv * 4Hok -0 x J* Fit: ■ '■ ^ : : L 
B — VWC Dt x min — 



■ ■ H ■■ ■— ' ' — Hi t' Dumnn 1 



P-1 " 
rtuman.WISP-1 



hum«r.WtSP-l 

mooM.wiSP-1 "lp : :■'/. :. 
B. 



hum 



mou»».wiSP-2 
huRMn.WISP-2 



huron.WISP- 
mouM.WISP-2 



I arfui 0 I G1--B1' Do n will 

VWC Ika.BiK 



mou».WJSP-2 >oi 
human.WISP-2 aw> b±i 



Fig. 2. Encoded amino acid sequence alignment of mouse and 
human WISP-1 (A) and mouse and human WISP-2 (B). The potential 
signal sequence, insulin-like growth factor-binding protein (IGF-BP), 
VWC, thrombospondin (TSP), and C-terminal (CT) domains are 
underlined. 

position 197. WISP-2 has 28 cysteine residues that are con- 
served among the 38 cysteines found in WISP-L 

Identification of WISP-3. To search for related proteins, we 
screened expressed sequence tag (EST) databases with the 
WISP-1 protein sequence and identified several ESTs as 
potentially related sequences. We identified a homologous 
protein that we have called WISP-3. A full-length human 
WISP-3 cDNA of 1,371 bp was isolated corresponding to those 
ESTs that encode a 354-aa protein with a predicted molecular 
mass of 39,293. WISP-3 has two potential N-linked glycosyl- 
ation sites and 36 cysteine residues. An alignment of the three 
human WISP proteins shows that WISP-1 and WISP-3 are the 
most similar (42% identity), whereas WISP-2 has 37% identity 
with WISP-1 and 32% identity with WISP-3 (Fig. 14). 

WISPs Are Homologous to the CTGF Family of Proteins. 
Human WISP-1, WISP-2, and WISP-3 are novel sequences; 
however, mouse WISP-1 is the same as the recently identified 
Elml gene. Elml is expressed in low, but not high, metastatic 
mouse melanoma cells, and suppresses the in vivo growth and 
metastatic potential of K-1735 mouse melanoma cells (15). 
Human and mouse WISP-2 are homologous to the recently 
described rat gene, rCop-1 (16). Significant homology (36- 
44%) was seen to the CCN family of growth factors. This family 
includes three members, CTGF, Cyr61, and the protoonco- 
gene nov. CTGF is a chemotactic and mitogen ic factor for 
fibroblasts that is implicated in wound healing and fibrotic 
disorders and is induced by TGF-0 (17). Cyr61 is an extracel- 
lular matrix signaling molecule that promotes cell adhesion, 
proliferation, migration, angiogenesis, and tumor growth (18, 
19). nov (nephroblastoma overexpressed) is an immediate 
early gene associated with quiescence and found altered in 
Wilms tumors (20). The proteins of the CCN family share 
functional, but not sequence, similarity to Wnt-1. All are 
secreted, cysteine-rich heparin binding glycoproteins that as- 
sociate with the cell surface and extracellular matrix. 

WISP proteins exhibit the modular architecture of the CCN 
family, characterized by four conserved cysteine-rich domains 
(Fig. 3B) (21). The N-terminal domain, which includes the first 
12 cysteine residues, contains a consensus sequence (GCGC- 
CXXC) conserved in most insulin-like growth factor (IGF)- 
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Fig. 3. (A) Encoded amino acid sequence alignment of human 
WISPs. The cysteine residues of WISP-1 and WISP-2 that are not 
present in WISP-3 are indicated with a dot. (B) Schematic represen- 
tation of the WISP proteins showing the domain structure and cysteine 
residues (vertical lines). The four cysteine residues in the VWC domain 
that are absent in WISP-3 are indicated with a dot. (C) Expression of 
WISP mRNA in human tissues. PCR was performed on human 
multiple-tissue cDNA panels (CLONTECH) from the indicated adult 
and fetal tissues. 

binding proteins (BP). This sequence is conserved in WISP-2 
and WISP-3, whereas WISP-1 has a glutamine in the third 
position instead of a glycine. CTGF recently has been shown 
to specifically bind IGF (22) and a truncated nov protein 
lacking the IGF-BP domain is oncogenic (23). The von Wil- 
lebrand factor type C module (VWC), also found in certain 
collagens and mucins, covers the next 10 cysteine residues, and 
is thought to participate in protein complex formation and 
oligomerization (24). The VWC domain of WISP-3 differs 
from all CCN family members described previously, in that it 
contains only six of the 10 cysteine residues (Fig. 3 A and B). 
A short variable region follows the VWC domain. The third 
module, the thrombospondin (TSP) domain is involved in 
binding to sulfated glycoconjugates and contains six cysteine 
residues and a conserved WSxCSxxCG motif first identified in 
thrombospondin (25). The C-terminal (CT) module contain- 
ing the remaining 10 cysteines is thought to be involved in 
dimerization and receptor binding (26). The CT domain is 
present in all CCN family members described to date but is 
absent in WISP-2 (Fig. 3 A and B). The existence of a putative 
signal sequence and the absence of a transmembrane domain 
suggest that WISPs are secreted proteins, an observation 
supported by an analysis of their expression and secretion from 
mammalian cell and baculovirus cultures (data not shown). 

Expression of WISP mRNA in Human Tissues. Tissue- 
specific expression of human WISPs was characterized by PCR 
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analysis on adult and fetal multiple tissue cDNA panels. 
WISP-1 expression was seen in the adult heart, kidney, lung, 
pancreas, placenta, ovary, small intestine, and spleen (Fig. 3C). 
Little or no expression was detected in the brain, liver, skeletal 
muscle, colon, peripheral blood leukocytes, prostate, testis, or 
thymus. WlSP-2 had a more restricted tissue expression and 
was detected in adult skeletal muscle, colon, ovary, and fetal 
lung. Predominant expression of WISPS was seen in adult 
kidney and testis and fetal kidney. Lower levels of WISP-3 
expression were detected in placenta, ovary, prostate, and 
small intestine. 

In Situ Localization of WISP-1 and WISP-2. Expression of 
WISP-1 and WISP-2 was assessed by in situ hybridization in 
mammary tumors from Wnt-1 transgenic mice. Strong expres- 
sion of WISP-1 was observed in stromal fibroblasts lying within 
the fibrovascular tumor stroma (Fig. 4 A-D). However, low- 
level WISP-1 expression also was observed focally within tumor 
cells (data not shown). No expression was observed in normal 
breast. Like WISP-1, WISP-2 expression also was seen in the 
tumor stroma in breast tumors from Wnt-1 transgenic animals 
(Fig. 4 E-H). However, WISP-2 expression in the stroma was 
in spindle-shaped cells adjacent to capillary vessels, whereas 




Fig. 4. (A, C, E, and G) Representative hematoxylin/eosin-stained 
images from breast tumors in Wnt-1 transgenic mice. The correspond- 
ing dark-field images showing WISP-1 expression are shown in B and 
D. The tumor is a moderately well-differentiated adenocarcinoma 
showing evidence of adenoid cystic change. At low power (A and B), 
expression of WISP-1 is seen in the delicate branching fibrovascular 
tumor stroma (arrowhead). At higher magnification, expression is seen 
in the stromal(s) fibroblasts (C and D), and tumor cells are negative. 
Focal expression of WISP-1, however, was observed in tumor cells in 
some areas. Images of IV1SP-2 expression are shown in E-H. At low 
power (E and F), expression of WISP-2 is seen in cells lying within the 
fibrovascular tumor stroma. At higher magnification, these cells 
appeared to be adjacent to capillary vessels whereas tumor cells are 
negative (G and H). 



the predominant cell type expressing WISP-1 was the stromal 
fibroblasts. 

Chromosome Localization of the WISP Genes. The chro- 
mosomal location of the human WISP genes was determined 
by radiation hybrid mapping panels. WISP-1 is approximately 
3.48 cR from the meiotic marker AFM259xc5 [logarithm of 
odds (lod) score 16.31] on chromosome 8q24.1 to 8q24.3, in the 
same region as the human locus of the novH family member 
(27) and roughly 4 Mbs distal to c-myc (28). Preliminary fine 
mapping indicates that WISP-1 is located near D8S1712 STS. 
WISP-2 is linked to the marker SHGC-33922 (lod - 1,000) on 
chromosome 20ql2-20ql3.1. Human WISP-3 mapped to chro- 
mosome 6q22-6q23 and is linked to the marker AFM211ze5 
(lod = 1,000). WISP-3 is approximately 18 Mbs proximal to 
CTGF and 23 Mbs proximal to the human cellular oncogene 
MYB (27, 29). 

Amplification and Aberrant Expression of WISPs in Human 
Colon Tumors. Amplification of protooncogenes is seen in 
many human tumors and has etiological and prognostic sig- 
nificance. For example, in a variety of tumor types, c-myc 
amplification has been associated with malignant progression 
and poor prognosis (30). Because WISP-1 resides in the same 
general chromosomal location (8q24) as c-myc, we asked 
whether it was a target of gene amplification, and, if so, 
whether this amplification was independent of the c-myc locus. 
Genomic DNA from human colon cancer cell lines was 
assessed by quantitative PCR and Southern blot analysis. (Fig. 
5 A and B). Both methods detected similar degrees of WISP-1 
amplification. Most cell lines showed significant (2- to 4-fold) 
amplification, with the HT-29 and WiDr cell lines demonstrat- 
ing an 8-fold increase. Significantly, the pattern of amplifica- 
tion observed did not correlate with that observed for c-myc, 
indicating that the c-myc gene is not part of the amplicon that 
involves the WISP-1 locus. 

We next examined whether the WISP genes were amplified 
in a panel of 25 primary human colon adenocarcinomas. The 
relative WISP gene copy number in each colon tumor DNA 
was compared with pooled normal DNA from 10 donors by 
quantitative PCR (Fig. 6). The copy number of WISP-1 and 
WISP-2 was significantly greater than one, approximately 
2-fold for WISP-1 in about 60% of the tumors and 2- to 4-fold 
for WISP-2 in 92% of the tumors (P < 0.001 for each). The 
copy number for WISP-3 was indistinguishable from one (P = 
0.166). In addition, the copy number of WISP-2 was signifi- 
cantly higher than that of WISP-1 (P < 0.001). 

The levels of WISP transcripts in RNA isolated from 19 
adenocarcinomas and their matched normal mucosa were 
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Fig. 5. Amplification of WISP-1 genomic DNA in colon cancer cell 
lines. (A) Amplification in cell line DNA was determined by quanti- 
tative PCR. (B) Southern blots containing genomic DNA (10 ^g) 
digested with EcoRl (WISP- 1) or Xba\ (c-myc) were hybridized with 
a 100-bp human WISP- 1 probe (amino acids 186-219) or a human 
c-myc probe (located at bp 1901-2000). The WISP and myc genes are 
detected in normal human genomic DNA after a longer film exposure. 
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Fig. 6. . Genomic amplification of WJSP genes in human colon 
tumors. The relative gene copy number of the WISP genes in 25 
adenocarcinomas was assayed by quantitative PCR, by comparing 
DNA from primary human tumors with pooled DNA from 10 healthy 
donors. The data are means ± SEM from one experiment done in 
triplicate. The experiment was repeated at least three times. 

assessed by quantitative PCR (Fig. 7). The level of WISP-1 
RNA present in tumor tissue varied but was significantly 
increased (2- to >25-fold) in 84% (16/19) of the human colon 
tumors examined compared with normal adjacent mucosa. 
Four of 19 tumors showed greater than 10-fold overexpression. 
In contrast, in 79% (15/19) of the tumors examined, WISP-2 
RNA expression was significantly lower in the tumor than the 
mucosa. Similar to WISP-1, WISP-3 RNA was overexpressed in 
63% (12/19) of the colon tumors compared with the normal 
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Fig. 7. WISP RNA expression in primary human colon tumors 
relative to expression in normal mucosa from the same patient. 
Expression of WISP mRNA in 19 adenocarcinomas was assayed by 
quantitative PCR. The Dukes stage of the tumor is listed under the 
sample number. The data are means ± SEM from one experiment 
done in triplicate. The experiment was repeated at least twice. 



mucosa. The amount of overexpression of WISPS ranged from 
4- to >40-fold. 

DISCUSSION 

One approach to understanding the molecular basis of cancer 
is to identify differences in gene expression between cancer 
cells and normal cells. Strategies based on assumptions that 
steady-state mRNA levels will differ between normal and 
malignant cells have been used to clone differentially ex- 
pressed genes (31). We have used a PCR-based selection 
strategy, SSH, to identify genes selectively expressed in 
C57MG mouse mammary epithelial cells transformed by 
Wnt-1. 

Three of the genes isolated, WISP-1, WISP-2, and WISP-3, 
are members of the CCN family of growth factors, which 
includes CTGF, Cyr61, and nov, a family not previously linked 
to Wnt signaling. 

Two independent experimental systems demonstrated that 
WISP induction was associated with the expression of Wnt-1. 
The first was C57MG cells infected with a Wnt-1 retroviral 
vector or C57MG cells expressing Wnt-1 under the control of 
a tetracyline-repressible promoter, and the second was in 
Wnt-1 transgenic mice, where breast tissue expresses Wnt-1, 
whereas normal breast tissue does not. No WISP RNA expres- 
sion was detected in mammary tumors induced by polyoma 
virus middle T antigen (data not shown). These data suggest 
a link between Wnt-1 and WISPs in that in these two situations, 
WISP induction was correlated with Wnt-1 expression. 

It is not clear whether the WISPs are directly or indirectly 
induced by the downstream components of the Wnt-1 signaling 
pathway (i.e., )3-catenin-TCF-l/Lefl). The increased levels of 
WISP RNA were measured in Wnt-l-transformed cells, hours 
or days after Wnt-1 transformation. Thus, WISP expression 
could result from Wnt-1 signaling directly through /3-catenin 
transcription factor regulation or alternatively through Wnt-1 
signaling turning on a transcription factor, which in turn 
regulates WISPs. 

The WISPs define an additional subfamily of the CCN family 
of growth factors. One striking difference observed in the 
protein sequence of WISP-2 is the absence of a CT domain, 
which is present in CTGF, Cyr61, nov, WISP-1, and WISP-3. 
This domain is thought to be involved in receptor binding and 
dimerization. Growth factors, such as TGF-jS, platelet-derived 
growth factor, and nerve growth factor, which contain a cystine 
knot motif exist as dimers (32). It is tempting to speculate that 
WISP-1 and WISP-3 may exist as dimers, whereas WISP-2 
exists as a monomer. If the CT domain is also important for 
receptor binding, WISP-2 may bind its receptor through a 
different region of the molecule than the other CCN family 
members. No specific receptors have been identified for CTGF 
or nov. A recent report has shown that integrin a v fo serves as 
an adhesion receptor for Cyr61 (33). 

The strong expression of WISP-1 and WISP-2 in cells lying 
within the fibrovascular tumor stroma in breast tumors from 
Wnt-1 transgenic animals is consistent with previous obser- 
vations that transcripts for the related CTGF gene are pri- 
marily expressed in the fibrous stroma of mammary tumors 
(34). Epithelial cells are thought to control the proliferation of 
connective tissue stroma in mammary tumors by a cascade of 
growth factor signals similar -to that controlling connective 
tissue formation during wound repair. It has been proposed 
that mammary tumor cells or inflammatory cells at the tumor 
interstitial interface secrete TGF-/31, which is the stimulus for 
stromal proliferation (34). TGF-01 is secreted by a large 
percentage of malignant breast tumors and may be one of the 
growth factors that stimulates the production of CTGF and 
WISPs in the stroma. 

It was of interest that WISP-1 and WISP-2 expression was 
observed in the stromal cells that surrounded the tumor cells 
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(epithelial cells) in the Wnt-1 transgenic mouse sections of 
breast tissue. This finding suggests that paracrine signaling 
could occur in which the stromal cells could supply WISP-1 and 
WISP-2 to regulate tumor cell growth on the WISP extracel- 
lular matrix. Stromal cell-derived factors in the extracellular 
matrix have been postulated to play a role in tumor cell 
migration and proliferation (35). The localization of WISP-1 
and WISP-2 in the stromal cells of breast tumors supports this 
paracrine model. 

An analysis of WISP-1 gene amplification and expression in 
human colon tumors showed a correlation between DNA 
amplification and overexpression, whereas overexpression of 
WISPS RNA was seen in the absence of DNA amplification. 
In contrast, WISP-2 DNA was amplified in the colon tumors, 
but its mRNA expression was significantly reduced in the 
majority of tumors compared with the expression in normal 
colonic mucosa from the same patient. The gene for human 
WISP-2 was localized to chromosome 20ql2-20ql3. at a region 
frequently amplified and associated with poor prognosis in 
node negative breast cancer and many colon cancers, suggest- 
ing the existence of one or more oncogenes at this locus 
(36-38). Because the center of the 20ql3 amplicon has not yet 
been identified, it is possible that the apparent amplification 
observed for WISP-2 may be caused by another gene in this 
amplicon. 

A recent manuscript on rCop-1, the rat orthologue of 
WISP-2, describes the loss of expression of this gene after cell 
transformation, suggesting it may be a negative regulator of 
growth in cell lines (16). Although the mechanism by which 
WISP-2 RNA expression is down-regulated during malignant 
transformation is unknown, the reduced expression of WISP-2 
in colon tumors and cell lines suggests that it may function as 
a tumor suppressor. These results show that the WISP genes 
are aberrantly expressed in colon cancer and suggest that their 
altered expression may confer selective growth advantage to 
the tumor. 

Members of the Wnt signaling pathway have been impli- 
cated in the pathogenesis of colon cancer, breast cancer, and 
melanoma, including the tumor suppressor gene adenomatous 
polyposis coli and j3-catenin (39). Mutations in specific regions 
of either gene can cause the stabilization and accumulation of 
cytoplasmic )3-catenin, which presumably contributes to hu- 
man carcinogenesis through the activation of target genes such 
as the WISPs. Although the mechanism by which Wnt-1 
transforms cells and induces tumorigenesis is unknown, the 
identification of WISPs as genes that may be regulated down- 
stream of Wnt-1 in C57MG cells suggests they could be 
important mediators of Wnt-1 transformation. The amplifica- 
tion and altered expression patterns of the WISPs in human 
colon tumors may indicate an important role for these genes 
in tumor development. 
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We have determined the relationship between mRNA and protein expression levels for selected genes 
expressed in the yeast Saccharomyces cerevisiae growing at mid-log phase. The proteins contained in total yeast 
cell lysate were separated by high-resolution two-dimensional (2D) gel electrophoresis. Over 150 protein spots 
were excised and identified by capillary liquid chromatography-tandem mass spectrometry (LC-MS/MS). 
Protein spots were quantified by metabolic labeling and scintillation counting. Corresponding mRNA levels 
were calculated from serial analysis of gene expression (SAGE) frequency tables (V. E. Velculescu, L. Zhang, 
W. Zhou, J. Vogelstein, M. A. Basrai, D. E. Bassett, Jr., P. Hieter, B. Vogelstein, and K. W. Kinzler, Cell 
88:243-251, 1997). We found that the correlation between mRNA and protein levels was insufficient to predict 
protein expression levels from quantitative mRNA data. Indeed, for some genes, while the mRNA levels were 
of the same value the protein levels varied by more than 20-fold. Conversely, invariant steady-state levels of 
certain proteins were observed with respective mRNA transcript levels that varied by as much as 30-fold. 
Another interesting observation is that codon bias is not a predictor of either protein or mRNA levels. Our 
results clearly delineate the technical boundaries of current approaches for quantitative analysis of protein 
expression and reveal that simple deduction from mRNA transcript analysis is insufficient. 



The description of the state of a biological system by the 
quantitative measurement of the system constituents is an es- 
sential but largely unexplored area of biology. With recent 
technical advances including the development of differential 
display-PCR (21), of cDNA microarray and DNA chip tech- 
nology (20, 27), and of serial analysis of gene expression 
(SAGE) (34, 35), it is now feasible to establish global and 
quantitative mRNA expression profiles of cells and tissues in 
species for which the sequence of all the genes is known. 
However , there is emerging evidence which suggests that 
mRNA expression patterns are necessary but are by them- 
selves insufficient for the quantitative description of biological 
systems. This evidence includes discoveries of posttranscrip- 
tional mechanisms controlling the protein translation rate (15), 
the half-lives of specific proteins or mRNAs (33), and the 
intracellular location and molecular association of the protein 
products of expressed genes (32). 

Proteome analysis, defined as the analysis of the protein 
complement expressed by a genome (26), has been suggested 
as an approach to the quantitative description of the state of a 
biological system by the quantitative analysis of protein expres- 
sion profiles (36). Proteome analysis is conceptually attractive 
because of its potential to determine properties of biological 
systems that are not apparent by DNA or mRNA sequence 
analysis alone. Such properties include the quantity of protein 
expression, the subcellular location, the state of modification, 
and the association with ligands, as well as the rate of change 
with time of such properties. In contrast to the genomes of a 
number of microorganisms (for a review, see reference 11) and 
the transcriptome of Saccharomyces cerevisiae (35), which have 
been entirely determined, no proteome map has been com- 
pleted to date. 

The most common implementation of proteome analysis is 
the combination of two-dimensional gel electrophoresis (2DE) 
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(isoelectric focusing-sodium dodecyl sulfate [SDS]-polyacryl- 
amide gel electrophoresis) for the separation and quantitation 
of proteins with analytical methods for their identification. 
2DE permits the separation, visualization, and quantitation of 
thousands of proteins reproducibly on a single gel (18, 24). By 
itself, 2DE is strictly a descriptive technique. The combination 
of 2DE with protein analytical techniques has added the pos- 
sibility of establishing the identities of separated proteins (1. 2) 
and thus, in combination with quantitative mRNA analysis, of 
correlating quantitative protein and mRNA expression mea- 
surements of selected genes. 

The recent introduction of mass spectrometric protein anal- 
ysis techniques has dramatically enhanced the throughput and 
sensitivity of protein identification to a level which now permits 
the large-scale analysis of proteins separated by 2DE. The 
techniques have reached a level of sensitivity that permits the 
identification of essentially any protein that is detectable in the 
gels by conventional protein staining (9, 29). Current protein 
analytical technology is based on the mass spectrometric gen- 
eration of peptide fragment patterns that are idiotypic for the 
sequence of a protein. Protein identity is established by corre- 
lating such fragment patterns with sequence databases (10, 22, 
37). Sophisticated computer software (8) has automated the 
entire process such that proteins are routinely identified with 
no human interpretation of peptide fragment patterns. 

In this study, we have analyzed the mRNA and protein levels 
of a group of genes expressed in exponentially growing cells of 
the yeast S. cerevisiae. Protein expression levels were quantified 
by metabolic labeling of the yeast proteins to a steady state, 
followed by 2DE and liquid scintillation counting of the se- 
lected, separated protein species. Separated proteins were 
identified by in-gel tryptic digestion of spots with subsequent 
analysis by microspray liquid chromatography-tandem mass 
spectrometry (LC-MS/MS) and sequence database searching. 
The corresponding mRNA transcript levels were calculated 
from SAGE frequency tables (35). 

This study, for the first time, explores a quantitative com- 
parison of mRNA transcript and protein expression levels for 
a relatively large number of genes expressed in the same met- 
abolic state. The resultant correlation is insufficient for predic- 
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FIG. 1. Schematic illustration of proteome analysis by 2DE and mass spectrometry. In part I, proteins are separated by 2DE, stained spots are excised and subjected 
to in-gel digestion with trypsin, and the resulting peptides are separated by on-line capillary high-performance liquid chromatography. In part II, a peptide is shown 
eluting from the column in part I. The peptide is ionized by electrospray ionization and enters the mass spectrometer. The mass of the ionized peptide is detected, and 
the first quadrupole mass filter allows only the specific mass-to-charge ratio of the selected peptide ion to pass into the collision cell. In the collision cell, the energized, 
ionized peptides collide with neutral argon gas molecules. Fragmentation of the peptide is essentially random but occurs mainly at the peptide bonds, resulting in smaller 
peptides of differing lengths (masses). These peptide fragments are detected as a tandem mass (MS/MS) spectrum in the third quadrupole mass filter where two ion 
series are recorded simultaneously, one each from sequencing inward from the N and C termini of the peptide, respectively. In part III, the MS/MS spectrum from the 
selected, ionized peptide is compared to predicted tandem mass spectra computer generated from a sequence database. Provided that the peptide sequence exists in 
the database, the peptide and, by association, the protein from which the peptide was derived can be identified. Unambiguous protein identification is attained in a single 
analysis because multiple peptides are identified as being derived from the same protein. 



tion of protein levels from mRNA transcript levels. We have 
also compared the relative amounts of protein and mRNA 
with the respective codon bias values for the corresponding 
genes. This comparison indicates that codon bias by itself is 
insufficient to accurately predict either the mRNA or the pro- 
tein expression levels of a gene. In addition, the results dem- 
onstrate that only highly expressed proteins are detectable by 
2DE separation of total cell lysates and that therefore the 
construction of complete proteome maps with current technol- 
ogy will be very challenging, irrespective of the type of organ- 
ism. 

MATERIALS AND METHODS 

Yeast strain and growth conditions. The source of protein and message tran- 
scripts for all experiments was YPH499 (MATa ura3-52 lys2-801 ade2~101 
leu2-AI his3-A200trpI-A63) (30). Logarithmically growing cells were obtained by 
growing yeast cells to early log phase (3 X 10 6 cells/ml) in YPD rich medium 
(YPD supplemented with 6 mM uracil, 4.8 mM adenine, and 24 mM tryptophan) 
at 30°C (35). Metabolic labeling of protein was accomplished in YPD medium 



exactly as described elsewhere (4) with the exception that 1 ml of cells was 
labeled with 3 mCi to offset methionine present in YPD medium. Protein was 
harvested as described by Garrels and coworkers (12). Harvested protein was 
lyophilized, resuspended in isoelectric focusing gel rehydration solution, and 
stored at -80°C. 

2DE. Soluble proteins were run in the first dimension by using a commercial 
flatbed electrophoresis system (Multiphor II; Pharmacia Biotech). Immobilized 
polyacrylamide gel (IPG) dry strips with nonlinear pH 3.0 to 10.0 gradients 
(Amersham-Pharmacia Biotech) were used for the first-dimension separation. 
Forty micrograms of protein from whole-cell lysates was mixed with IPG strip 
rehydration buffer (8 M urea, 2% Nonidet P-40, 10 mM dithiothreitol), and 250 
to 380 \i\ of solution was added to individual lanes of an IPG strip rehydration 
tray (Amersham-Pharmacia Biotech). The strips were allowed to rehydrate at 
room temperature for 1 h. The samples were run at 300 V-10 mA-5 W for 2 h, 
then ramped to 3,500 V-10 mA-5 W over a period of 3 h, and then kept at 3,500 
V-10 mA-5 W for 15 to 19 h. At the end of the first-dimension run (60 to 70 kV • 
h), the IPG strips were reequilibrated for 8 min in 2% (wt/vol) dithiothreitol in 
2% (wt/vol) SDS-6 M urea-30% (wt/vol) glycerol-0.05 M Tris HCI (pH 6.8) and 
for 4 min in 2.5% iodoacetamide in 2% (wt/vol) SDS-6 M urea-30% (wt/vol) 
glycerol-0.05 M Tris HQ (pH 6.8). Following reequilibration, the strips were 
transferred and apposed to 10% polyacrylamide second-dimension gels. Poly- 
acrylamide gels were poured in a casting stand with 10%; acrylamide-2.67% 
piperazine diacrylamide-0.375 M Tris base-HQ (pH 8.8>-0.1% (wt/vol) SDS-0.05% 




FIG. 2. 2D silver-stained gel of the proteins in yeast total cell lysate. Proteins were separated in the first dimension (horizontal) by isoelectric focusing and then in 
the second dimension (vertical) by molecular weight sieving. Protein spots (156) were chosen to include the entire range of molecular weights, isoelectric focusing points, 
and staining intensities. Spots were excised, and the corresponding protein was identified by mass spectrometry and database searching. The spots are labeled on the 
gel and correspond to the data presented in Table 1. Molecular weights are given in thousands. 



(wt/vol) ammonium persuIfate-O.05% TEMED (A^A'^'-tetramethylethyl- 
enediamine) in Milli-Q water. The apparatus used to run second-dimension gels 
was a noncommercial apparatus from Oxford Glycosciences, Inc. Once the IPG 
strips were apposed to the second-dimension gels, they were immediately run at 
50 mA (constant)-500 V-85 W for 20 min, followed by 200 mA (constant)-500 
V-85 W until the buffer front line was 10 to 15 mm from the bottom of the gel. 
Gels were removed and silver stained according to the procedure of Shevchenko 
et al. (29). 

Protein identification. Gels were exposed to X-ray film overnight, and then the 
silver staining and film were used to excise 156 spots of varying intensities, 
molecular weights, and isoelectric focusing points. In order to increase the 
detection limit by mass spectrometry, spots were cut out and pooled from up to 
four identical cold, silver-stained gels. In-gel tryptic digests of pooled spots were 
performed as described previously (29). Tryptic peptides were analyzed by mi- 
crocapillary LC-MS with automated switching to MS/MS mode for peptide 
fragmentation. Spectra were searched against the composite OWL protein se- 
quence database (version 30.2; 250,514 protein sequences) (24a) by using the 
computer program Sequest (8), which matches theoretical and acquired tandem 
mass spectra. A protein match was determined by comparing the number of 
peptides identified and their respective cross-correlation scores. All protein 
identifications were verified by comparison with theoretical molecular weights 
and isoelectric points. 



mRNA quantitation. Velculescu and coworkers have previously generated 
frequency tables for yeast mRNA transcripts from the same strain grown under 
the same stated conditions as described herein (35). The SAGE technology is 
based on two main principles. First, a short sequence tag (15 bp) that contains 
sufficient information uniquely to identify a transcript is generated. A single tag 
is usually generated from each mRNA transcript in the cell which corresponds to 
15 bp at the 3 '-most cutting site for Nla\\\. Second, many transcript tags can be 
concatenated into a single molecule and then sequenced, revealing the identity of 
multiple tags simultaneously. Over 20,000 transcripts were sequenced from yeast 
strain YPH499 growing at mid- log phase on glucose. Assuming the previously 
derived estimate of 15,000 mRNA molecules per cell (16), this would represent 
a 1.3-fold coverage even for mRNA molecules present at a single copy per cell 
and would provide a 72% probability of detecting such transcripts. Computer 
software which took for input the gene detected, examined the nucleotide se- 
quence, and performed the calculation as described by Velculescu and coworkers 
(35) was written. In practice, we found that for 21 of 128 (16%) genes examined 
viable mRNA levels from SAGE data could not be calculated. This was because 
(i) no CATG site was found in the open reading frame (ORF), (ii) a CATG site 
was found but the corresponding 10-bp putative SAGE tag was not found in the 
frequency tables, or (iii) identical putative SAGE tags were present for multiple 
genes (e.g., TDH2 YEAST and TDH3_ YEAST). 
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TABLE 1. Expressed genes identified from 2D gel in Fig. 2 TABLE 1— Continued 



Mol wt 


Pi 


Spot no. 


YPD oene 

name" 


Protein 
ah u ndan ce 
(10 3 copies/ 
cell) 


mRNA 
abundance 
(copies/cell) 


Codon 
bias 


17,259 


6.75 


133 


CPR1 


15.2 


61.7 


0.769 


18,702 


4.80 


83 


EGD2 


20.1 


5.2 


0.724 


18,726 


4.44 


147 


YKL056C 


61.2 


88.4 


0.831 


18,978 


5.95 


135 


YER067W 


3.7 


6.7 


0.118 


19,108 


5.04 


130 


YLR109W 


94.4 


9.7 


0.680 


19,681 


9.08 


136 


ATP7 


11.0 




0.246 


20,505 


6.07 


111 


GUK1 


16.5 


3.7 


0.422 


21,444 


5.25 


148 


SARI 


5.4 


10.4 


0.455 


21,583 


4,98 


95 


TSA1 


110.6 


40.1 


0.845 


22,602 


4.30 


80 


EFB1 


66.1 


23.8 


0.875 


23,079 


6.29 


112 


SOD2 


12.6 


2.2 


0.351 


23,743 


5.44 


137 


HSP26 


NA rf 


0.7 


0.434 


24,033 


5.97 


96 


ADK1 


17.4 


16.4 


0.656 


24,058 


4.43 


143 


YKL117W 


29.2 


10.4 


0.339 


24,353 


6.30 


140 


TFS1 


8.1 


0.7 


0.146 


24,662 


5.85 


99 


URA5 


25.4 


6.0 


0,359 


24,808 


6.33 


97 


GSP1 


26.3 


5.2 


0.735 


24,908. 


8.73 


122 


RPS5 


18.6 


NA C 


0.899 


25,081 


4.65 


81. 


MRP8 


9.3 


NA C 


0.241 


25,960 


6.06 


116 


RPE1 


5.8 


0.7 


0.372 


26,378 


9.55 


127 


RPS3 


96.8 


NA C 


0.863 


26.467 


5/18 


100 


VMA4 


10.5 


3.7 


0.427 


26,661 


5.84 


98 


TPI1 


NA rf 


NA C 


0.900 


27,156 


5.56 


93 


PRE8 


6.9 


0.7 


0.129 


27,334 


6.13 


115 


YHR049W 


18.4 


2.2 


0.520 


27.472 


533 


92 


YNL010W 


31.6 


3.7 


0.421 


27,480 


8.95 


123 


GPM1 


10.0 


169.4 


0.902 


27,480 


8.95 


124 


GPM1 


231.4 


169.4 


0.902 


27,480 


8.95 


125 


GPM1 


■ 7.5 


169.4 


0.902 


27,809 


5.97 


139 


HOR2 


5.7 


0.7 


0.381 


27,874 


4.46 


78 


YST1 


13.6 


52.8 


0.805 


28,595 


4.51 


41 


PUP2 


4.4 


0.7 


0.147 


29,156 


6.59 


114 


YMR226C 


14.5 


2.2 


0.283 


29,244 


8.40 


120 


DPMI 


5.0 


11.2 


0.362 


29,443 


5.91 


48 


PRE4 


3.4 


3.7 


0.162 


30,012 


6.39 


138 


PRB1 


21.2 


1.5 


0.449 


30,073 


4.63 


77 


BMH1 


14.7 


28.2 


0.454 


30,296 


7.94 


121 


OMP2 


67.4 


41.6 


0.499 


30,435 


6.34 


89 


GPP1 


70.2 


11.2 


0.703 


31,332 


5.57 


88 


ILV6 


13.9 


3.0 


0.402 


32,159 


5.46 


113 


IPP1 


63.1 


3.7 


0.752 


32,263 


6.00 


149 


HIS1 


22.4 


4.5 


0.232 


33,311 


5.35 


84 


SPE3 


15.1 


6.7 


0.468 


34,465 


5.60 


129 


ADE1 


8.7 


5.2 


0.305 


34,762 


5.32 


85 


SEC! 4 


10.9 


6.0 


0.373 


34,797 


5.85 


42 


URA1 


49.5 


8.9 


0.237 


34,799 


6.04 


90 


BEL1 


103.2 


81.0 


0.875 


35,556 


5.97 


43 


YDL124W 


6.4 


4.5 


0.206 


35,619 


8.41 


59 


TDH1 


69.8 


32.7 C 


0.940 


35.650 


5.49 


68 


CAR1 


5.2 


3.0 


0.339 


35,712 


6.72 


117 


TDH2 


49.6 ■ 


473.0" 


0.982 


35,712 


6.72 


154 


TDH2 


863.5 


473.0 C 


0.982 


35,712 


6.72 


155 


TDH2 


79.4 


473.tr 


0,982 


36,272 


4.85 


128 


APA1 


8.7 


0.7 . 


0.425 


36,358 


5.05 


75 


YJR105W 


17.6 


17.1 


0.522 


36,358 


5.05 


76 


YJR105W 


27.5 


17.1 


0.522 


36,596 


6.37 


79 


ADH2 


58.9 


260.0" 


0.711 


36,714 


6.30 


102 


ADH1 


746.1 


260.0 


0.913 


36,714 


6.30 


103 


ADH1 


17.6 


260.0 


0.913 


36,714 


6.30 


104 


ADH1 


61.4 


260.0 


0.913 


36,714 


6.30 


105 


ADH1 


52.7 


260.0 


0.913 


37,033 


6.23 


44 


TALI 


44.8 


3.7 


0.701 


37,796 


7.36 


57 


IDH2 


29.4 


6.7 


0.330 


37,886 


6.49 


106 


ILV5 


76.0 


4.5 


0.892 


38,700 


7.83 


55 


BAT1 


30.9 


11.2 


0.469 


38,702 


6.24 


46 


QCR2 


NA d 


2.2 


0.326 



Mol wt 


Pi 


Spot no. 


YPD gene 
name 0 


Protein 
abundance 
(10 3 copies/ 
cell) 


mRNA 
abundance 
(copies/cell) 


Codon 
bias 


39,477 


5.58 


86 


FBA1 


17.8 


183.6 


0.935 


. 39,477 


5.58 


87 


FBA1 


427.2 


183.6 


0.935 


39,540 


6.50 


150 


HOM2 


60.3 


4.5 


0.592 


39,561 


6.12 


156 


PSA1 


96.4 


27.5 


0.718 


41,158 


6.01 


49 


YNL134C 


14.9 


1.5 


0.316 


41,623 


7.18 


58 


BAT2 


19.0 


8.9 


0.250 


41,728 


7.29 


110 


ERG 10 


24.1 


4.5 


0.543 . 


41,900 


5.42 


74 


TOM40 


22.3 


2.2 


0.375 


42,402 


6.29 


45 


CYS3 


6.7 


8.9 


0.621 


42,883 


5.63 


67 


DYS1 


15.8 


5.2 


0.526 


43,409 


6.31 


107 


SER1 


10.5 


1.5 


0.292 


43,421 


5.59 


91 


ERG6 


2.2 


14.1 


0.408 


44,174 


7.32 


56 


YBR025C 


13.1 


6.0 


0.684 


44,682 


4.99 


72 


TIF1 


2.9 


39.4 


0.834 


44,707 


7.77 


108 


PGK1 


23.7 


165.7 


0.897 


44,707 


7.77 


109 


PGK1 


315.2 


165.7 


0.897 


46,080 


6.72 


30 


CAR 2 


15.4 


NA C 


0.495 


46,383 


8.52 


53 


IDP1 


7.7 


0.7 


0.436 


46,553 


5.98 


47 


IDP2 


32.4 


NA C 


0.197 


46,679 


6.39 


50 


ENOl 


35.4 


0.7 


0.930 


46,679 


6,39 


51 


ENOl 


6.6 


0.7 


0.930 


46,679 


6.39 


52 


ENOl 


2.2 


0.7 


0.930 


46,773 


5.82 


63 


EN02 


15.5 


289.1 


0.960 


46,773 


5.82 


64 


EN02 


635.5 


289. 1 


0.960 


46,773 


5.82 


65 


EN02 


93.0 


289.1 


0.960 


46,773 


5.82 


66 


EN02 


31.0 


289.1 


0.960 


47,402 


6.09 


126 


COR1 


2.5 


0.7 


0.422 


47,666 


8.98 


54 


AAT2 


11.7 


6.0 


0.338 


48,364 


5.25 


73 


WTM1 


74.5 


13.4 


0.365 


48,530 


6.20 


61 


MET17 


38.1 


29.0 


0.576 


48,904 


5.18 


69 


LYS9 


16.2 


3.7 


0.463 


48,987 


4.90 


153 


SUP45 


29.6 


11.9 


0.377 


49,727 


5.47 


70 


PR02 


13.6 


5.2 


0.297 


49,912 


9.27 


62 


TEF2 


558.5 


282.0 


0.932 


50,444 


5.67 


35 


YDR190C 


4.8 


2,2 


0.228 


50,837 


6:11 


32 


YEL047C 


3.8 


1.5 


0.387 


50,891 


4.59 


151 


TUB2 


11.2 


7.4 


0.404 


51,547 


6.80 


27 


LPD1 


18.9 


2.2 


0.351 


52,216 


7.25 


29 


SHM2 


19.7 


7.4 


0.722 


52,859 


5.54 


37 


YFR044C 


30.2 


6.7 


0.442 


53,798 


5.19 


71 


HXK2 


26.5 


7.4 


0.756 


53,803 


6.05 


145 


GYP6 


4.4 


0.7 


0.147 


54,403 


5.29 


39 


ALD6 


37.7 


2.2 


0.664 


54,403 


5.29 


40 


ALD6 


6.6 


2.2 


0.664 


54,502 


6.20 


31 


ADE13 


6.3 


1.5 


0.417 


54,543 


7.75 


25 


PYK1 


225.3 


101.8 


0.965 


54,543 


7.75 


26 


PYK1 


39.8 


101.8 


0.965 


55,221 


6.66 


146 


YEL071W 


16.3 


3.0 


0.244 


55,295 


4,35 


134 


PDI1 


66.2 


14.1 


0.589 


55,364 


5.98 


24 


GLK1 


22.6 


6.0 


0.237 


55,481 


7.97 


118 


ATP1 


21.6 


2.2 


0.637 


55.886 


6.47 


28 


CYS4 


22.2 


NA C 


0.444 


56,167 


5.83 


33 


AR08 


14.3 


3.0 


0.324 


56,167 


5.83 


34 


AROS 


9.1 


3.0 


0.324 


56,584 


6.36 


20 


CYB2 


18.9 


NA 6 ' 


0.259 


57,366 


5.53 


60 


FRS2 


2.3 


0.7 


0.451 


57,383 


5.98" 


144 


ZWF1 


5.6 


0.7 


0.215 


57,464 


5.49 


36 


THR4 


21.4 


3.7 


0.508 


57,512 


5.50 


7 


SRV2 


6.5 


NA C 


0.260 


57,727 


4.92 . 


152 


VMA2 


33.7 


8.9 


0.546 


58,573 


6.47 


17 


ACH1 


4.4 


1.5 


0.327 


58,573 


6.47 


18 


ACH1 


5.4 


1.5 


0.327 


61,353 


5.87 


21 


PDC1 


6.5 


200.7 


0.962 


61,353 


5.87 


22 


PDC1 


303.2 


200.7 


0.962 


61,353 


5.87 


23 


PDC1 


16.3 


200.7 


0.962 


61,649 


5.54 


38 


CCT8 


2.2 


1.5 


0.271 



Continued 
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TABLE \— Continued 



Mol wt 




Spot no. 


YPD pene 
name" 


Protein 
abundance 
(1.0 3 copies/ 
cell) 


mRNA 
abundance 
(copies/cell) 


Codon 
bias 


61,902 


6.21 


101 


PDC5 


4.3 


NA C 


0.828 


62,266 


6.19 


16 


TCL1 


20.1 


NA C 


0.327 


62,862 


8.02 


19 


1LV3 


5.3 


4.5 


0.548 


63,082 


6.40 


119 


PGM2 


2.2 


3.0 


0.402 


64,335 


5.77 


5 


PAB1 


30.4 


1.5 


0.616 


66,120 


5.42 


8 


STI1 


6.7 


0.7 


0.313 


66,120 


5.42 ■ 


9 


STI1 


6.4 


0!7 


0.313 


66,450 


5.29 


141 


SSB2 


7.0 


NA C 


0.880 


66,450 


5.29 


142 


SSB2 


2.3 


NA C 


0.880 


66,456 


5.23 


10 . 


SSB1 


64.5 


79.5 


0.907 


66,456 


5.23 


11 


SSB1 


59.0 


79.5 


0.907 


66,456 


5.23 


12 


SSB1 


13.7 


79.5 


0.907 


68,397 


5.82 


82 


LEU4 


3.1 


3.0 


0.407 


69,3 J 3 


4.90 


13 


SSA2 


24.3 


18.6 


0.892 


69,313 


4.90 


14 


SSA2 


77.1 


18.6 


0.892 


74,378 


8.46 


15 


YKL029C 


2.8 


3.7 


0.353 


75,396 


5.82 


6 


GRS1 


5.5 


7.4 


0.500 


85,720 


6.25 


1 


MET6 


2.0 


NA C 


0.772 


85,720 


6.25 


2 


MET6 


10.9 


NA C 


0.772 


85,720 


6.25 


3 


MET6 


1.4 


NA C 


0.772 


93,276 


6.11 


131 


EFT1 


17.9 


41.6 


0,890 


93,276 


6.11 


132 


EFT1 


5.7 


41.6 


0.890 


102,064* 


6.6r 


94 


ADE3 


4.8 


5.2 


0.423 


107,482* 


5.33" 


4 


MCM3 


2.7 


NA C 


0.240 



" YPD gene names are available from the YPD website (39). 
h NA, calculation could not be performed or was not available. 
c mRNA data inconclusive or NA. 

d No methionines in predicted ORF; therefore, protein concentration was not 
determined. 

e Measured molecular weight or pi did not match theoretical molecular weight 
or pi. 



Protein quantitation. [ 35 S]methionine-labeled gels were exposed to X-ray film 
overnight, and then the silver stain and film were used to excise 156 spots of 
varying intensities, molecular weights, and pis. The excised spots were placed in 
0.6-rn! microcentrifuge tubes, and scintillation cocktail (100 was added. The 
samples were vortexed and counted. In addition, two parallel gels were electro- 
blotted to polyvinyl idene difluoride membranes. The membranes were exposed 
to X-ray film, and four intense single spots were excised from each membrane 
and subjected to amino acid analysis. For these four spots, a mean of 209 ± 4 
cpm/pmol of protein/methionine was found. This number was used to quantitate 
all remaining spots in conjunction with the number of methionines present in the 
protein. 

To ensure that proteins were labeled to equilibrium, parallel 2D gels were 
prepared and run on yeast metabolically labeled for 1, 2, 6, or 18 h. The 
corresponding 156 spots were excised from each gel, and radioactivity was mea- 
sured by liquid scintillation counting for each spot. Calculated protein levels were 
highly reproducible for all time points measured after 1 h. 

Calculation of codon bias and predicted half-life. Codon bias values were 
extracted from the YPD spreadsheet (17). Protein half-lives were calculated 
based on the N-end rule (33). When the N-terminal processing was not known 
experimentally, it was predicted based on the affinity of methionine aminopep- 
tidase (31). 

RESULTS 

Characteristics of proteome approach. Nearly every facet of 
proteome analysis hinges on the unambiguous identification of 
large numbers of expressed proteins in cells. Several tech- 
niques have been described previously for the identification of 
proteins separated by 2DE, including N-terminal and internal 
sequencing (1, 2), amino acid analysis (38), and more recently 
mass spectrometry (25). We utilized techniques based on mass 
spectrometry because they afford the highest levels of sensitiv- 
ity and provide unambiguous identification. The specific pro- 
cedure used is schematically illustrated in Fig. 1 and is based 
on three principles. First, proteins are removed from the gel by 



proteolytic in-gel digestion, and the resulting peptides are sep- 
arated by on-line capillary high-performance liquid chromatog- 
raphy. Second, the eluting peptides are ionized and detected, and 
the specific peptide ions are selected and fragmented by the 
mass spectrometer. To achieve this, the mass spectrometer 
switches between the MS mode (for peptide mass identifica- 
tion) and the MS/MS mode (for peptide characterization and 
sequencing). Selected peptides are fragmented by a process 
called collision-induced dissociation (CID) to generate a tan- 
dem mass spectrum (MS/MS spectrum) that contains the pep- 
tide sequence information. Third, individual CID mass spectra 
are then compared by computer algorithms to predicted spec- 
tra from a sequence database. This results in the identification 
of the peptide and, by association, the protein(s) in the spot. 
Unambiguous protein identification is attained in a single anal- 
ysis by the detection of multiple peptides derived from the 
same protein. 

Protein identification. Yeast total cell protein lysate (40 |xg), 
metabolically labeled with [ 35 S]methionine, was electro- 
phoretically separated by isoelectric focusing in the first dimen- 
sion and by SDS-10% polyacrylamide gel electrophoresis in 
the second dimension. Proteins were visualized by silver stain- 
ing and by autoradiography. Of the more than 1,000 proteins 
visible by silver staining, 156 spots were excised from the gel 
and subjected to in-gel tryptic digestion, and the resulting 
peptides were analyzed and identified by microspray LC- 
MS/MS techniques as described above. The proteins in this 
study were all identified automatically by computer software 
with no human interpretation of mass spectra. They are indi- 
cated in Fig. 2 and detailed in Table 1. 

The CID spectra shown in Fig. 3 indicate that the quality of 
the identification data generated was suitable for unambiguous 
protein identification. The spectra represent the amino acid 
sequences of tryptic peptides NSGDIVNLGS1AGR (Fig. 3A) 
and FAVGAFTDSLR (Fig. 3B), Both peptides were derived 
from protein S57593 (hypothetical protein YMR226C), which 
migrated to spot 114 (molecular weight, 29,156; pi, 6.59) in the 
2D gel in Fig. 2. Five other peptides from the same analysis 
were also computer matched to the same protein sequence. 

Protein and mRNA quantitation. For the 156 genes investi- 
gated, the protein expression levels ranged from 2,200 (PGM2) 
to 863,000 (TDH2/TDH3) copies/cell. The levels of mRNA for 
each of the genes identified were calculated from SAGE fre- 
quency tables (35). These tables contain the mRNA levels for 
4,665 genes in yeast strain YPH499 grown to mid-log phase in 
YPD medium on glucose as a carbon source. In some in- 
stances, the mRNA levels could not be calculated for reasons 
stated in Materials and Methods. For the proteins analyzed in 
this study, mean transcript levels varied from 0.7 to 473 copies/ 
cell. 

Selection of the sample population for mRNA-protein ex- 
pression level correlation. The protein spots selected for iden- 
tification were selected from spots visible by silver staining in 
the 2D gel. An attempt was made not to include spots where 
overlap with other spots was readily apparent. The number of 
proteins identified was 156 (Table 1). Some proteins migrated 
to more than one spot (presumably due to differential protein 
processing or modifications), and protein levels from these 
spots were calculated by integrating the intensities of the dif- 
ferent spots. The 156 protein spots analyzed represented the 
products of 128 different genes. Genes were excluded from the 
correlation analysis only if part of the data set was missing; i.e., 
genes were excluded if (i) no mRNA expression data were 
available for the protein or putative SAGE tags were ambig- 
uous, (ii) the amino acid sequence did not contain methionine, 
(iii) more than a single protein was conclusively identified as 
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FIG. 3. Tandem mass (MS/MS) spectra resulting from analysis of a single spot on a 2D gel. The first quadrupole selected a single mass-to-charge ratio (m/z) of 687.2 
(A) or 592.6 (B), while the collision cell was filled with argon gas, and a voltage which caused the peptide to undergo fragmentation by CID was applied. The third 
quadrupole scanned the mass range from 50 to 1,400 m/z. The computer program Sequest (8) was utilized to match MS/MS spectra to amino acid sequence by database 
searching. Both spectra matched peptides from the same protein, S57593 (yeast hypothetical protein YMR226C). Five other peptides from the same analysis were 
matched to the same protein. 



migrating to the same gel spot, or (iv) the theoretical and 
observed pis and molecular weights could not be reconciled. 
After these criteria were applied, the number of genes used in 
the correlation analysis was 106. . 



Codon bias and predicted half-lives. Codon bias is thought 
to be an indicator of protein expression, with highly expressed 
proteins having large codon bias values. The codon bias distri- 
bution for the entire set of more than 6,000 predicted yeast 
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gene ORFs is presented in Fig. 4A. The interval with the 
largest frequency of genes is between the codon bias values of 
0.0 and 0.1. This segment contains more than 2,500 genes. The 
distribution of the codon bias values of the 128 different genes 
found in this study (all protein spots from Fig. 2) is shown in 
Fig. 4B, and protein half-lives (predicted from applying the 
N-end rule [33] to the experimentally determined or predicted 
protein N termini) are shown in Fig. 4C. No genes were iden- 
tified with codon bias values less than 0.1 even though thou- 
sands of genes exist in this category. In addition, nearly all of 
the proteins identified had long predicted half-lives (greater 
than 30 h). 

Correlation of mRNA and protein expression levels. The 

correlation between mRNA and protein levels of the genes 
selected as described above is shown in Fig. 5. For the entire 
group (106 genes) for which a complete data set was gener- 
ated, there was a general trend of increased protein levels 
resulting from increased mRNA levels. The Pearson product 
moment correlation coefficient for the whole data set (106 
genes) was 0.935. This number is highly biased by a small 
number of genes with very large protein and message levels. A 
more representative subset of the data is shown in the inset of 
Fig. 5. It shows genes for which the message level was below 10 
copies/cell and includes 69% (73 of 106 genes) of the data used 
in the study. The Pearson product moment correlation coeffi- 
cient for this data set was only 0.356. We also found that levels 
of protein expression coded for by mRNA with comparable 
abundance varied by as much as 30-fold and that the mRNA 
levels coding for proteins with comparable expression levels 
varied by as much as 20-fold. 

The distortion of the correlation value induced by the un- 
even distribution of the data points along the x axis is further 
demonstrated by the analysis in Fig. 6. The 106 samples in- 
cluded in the study were ranked by protein abundance, and the 
Pearson product moment correlation coefficient was repeat- 
edly calculated after including progressively more, and higher- 
abundance, proteins in each calculation. The correlation values 
remained relatively stable in the range of 0.1 to 0.4 if the 
lowest-expressed 40 to 95 proteins used in this study were 
included. However, the correlation value steadily climbed by 
the inclusion of each of the 11 very highly expressed proteins. 

Correlation of protein and mRNA expression levels with 
codon bias. Codon bias is the propensity for a gene to utilize 
the same codon to encode an amino acid even though other 
codons would insert the identical amino acid in the growing 
polypeptide sequence. It is further thought that highly ex- 
pressed proteins have large codon biases (3). To assess the 
value of codon bias for predicting mRNA and protein levels in 
exponentially growing yeast cells, we plotted the two experi- 
mental sets of data versus the codon bias (Fig. 7). The distri- 
bution patterns for both mRNA and protein levels with respect 
to codon bias were highly similar. There was high variability in 
the data within the codon bias range of 0.8 to 1.0. Although a 
large codon bias generally resulted in higher protein and mes- 
sage expression levels, codon bias did not appear to be predic- 
tive of either protein levels or mRNA levels in the cell 

DISCUSSION 

The desired end point for the description of a biological 
system is not the analysis of mRNA transcript levels alone but 
also the accurate measurement of protein expression levels and 
their respective activities. Quantitative analysis of global 
mRNA levels currently is a preferred method for the analysis 
of the state of cells and tissues (11). Several methods which 
either provide absolute mRNA abundance (34, 35) or relative 
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FIG. 4. Current proteome analysis technology utilizing 2DE without preen- 
richment samples mainly highly expressed and long-lived proteins. Genes encod- . 
ing highly expressed proteins generally have large codon bias values. (A) Distri- 
bution of the yeast genome (more than 6,000 genes) based on codon bias. The 
interval with the largest frequency of genes is 0.0 to 0.1, with more than 2,500 
genes. (B) Distribution of the genes from identified proteins in this study based 
on codon bias. No genes with codon bias values less than 0.1 were detected in this 
study. (C) Distribution of identified proteins in this study based on predicted 
half-life (estimated by N-end rule). 



mRNA levels in comparative analyses (20, 27) have been de- 
scribed elsewhere. The techniques are fast and exquisitely sen- 
sitive and can provide mRNA abundance for potentially any 
expressed gene. Measured mRNA levels are often implicitly or 
explicitly extrapolated to indicate the levels of activity of the 
corresponding protein in the cell. Quantitative analysis of pro- 
tein expression levels (proteome analysis) is much more time- 
consuming because proteins are analyzed sequentially one by 
one and is not general because analyses are limited to the 
relatively highly expressed proteins. Proteome analysis does, 
however, provide types of data that are of critical importance 
for the description of the state of a biological system and that 
are not readily apparent from the sequence and the level of 
expression of the mRNA transcript. This study attempts to 
examine the relationship between mRNA and protein expres- 
sion levels for a large number of expressed genes in cells 
representing the same state. 

Limits in the sensitivity of current protein analysis technol- 
ogy precluded a completely random sampling of yeast proteins. 
We therefore based the study on those proteins visible by silver 
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FIG. 5. Correlation between protein and mRNA levels for 106 genes in yeast growing at log phase with glucose as a carbon source. mRNA and protein levels were 
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69% of the original data set. The Pearson product moment correlation for the entire data set was 0.935. The correlation for the inset containing 73 proteins (69%) was 
only 0.356. 



staining on a 2D gel. Of the more than 1,000 visible spots, 156 
were chosen to include the entire range of molecular weights, 
isoelectric focusing points, and staining intensities displayed on 
the 2D protein pattern. The genes identified in this study 
shared a number of properties. First, all of the proteins in this 
study had a codon bias of greater than 0.1 and 93% were 
greater than 0.2 (Fig. 4B). Second, with few exceptions, the 
proteins in this study had long predicted half-lives according to 
the N-end rule (Fig. 4C). Third, low-abundance proteins with 
regulatory functions such as transcription factors or protein 
kinases were not identified. 

Because the population of proteins used in this study ap- 
pears to be fairly homogeneous with respect to predicted half- 
life and codon bias, it might be expected that the correlation of 
the mRNA and protein expression levels would be stronger for 
this population than for a random sample of yeast proteins. We 
tested this assumption by evaluating the correlation value if 
different subsets of the available data were included in the 
calculation. The 106 proteins were ranked from lowest to high- 
est protein expression level, and the trend in the correlation 
value was evaluated by progressively including more of the 
higher-abundance proteins in the calculation (Fig. 6). The cor- 
relation value when only the lower-abundance 40 to 93 pro- 
teins were examined was consistently between 0.1 and 0.4. If 
the 11 most abundant proteins were included, the correlation 
steadily increased to 0.94. We therefore expect that the corre- 
lation for all yeast proteins or for a random selection would be 
less than 0.4. The observed level of correlation between 
mRNA and protein expression levels suggests the importance 



of posttranslational mechanisms controlling gene expression. 
Such mechanisms include translational control (15) and con- 
trol of protein half-life (33). Since these mechanisms are also 
active in higher eukaryotic cells, we speculate that there is no 
predictive correlation between steady-state levels of mRNA 
and those of protein in mammalian cells. 

Like other large-scale analyses, the present study has several 
potential sources of error related to the methods used to de- 
termine mRNA and protein expression levels. The mRNA 
levels were calculated from frequency tables of SAGE data. 
This method is highly quantitative because it is based on actual 
sequencing of unique tags from each gene, and the number of 
times that a tag is represented is proportional to the number of 
mRNA molecules for a specific gene. This method has some 
limitations including the following: (i) the magnitude of the 
error in the measurement of mRNA levels is inversely propor- 
tional to the mRNA levels, (ii) SAGE tags from highly similar 
genes may not be distinguished and therefore are summed, (iii) 
some SAGE tags are from sequences in the 3' untranslated 
region of the transcript, (iv) incomplete cleavage at the SAGE 
tag site by the restriction enzyme can result in two tags repre- 
senting one mRNA, and (v) some transcripts actually do not 
generate a SAGE tag (34, 35). 

For the SAGE method, the error associated with a value 
increases with a decreasing number of transcripts per cell. The 
conclusions drawn from this study are dependent on the qual- 
ity of the mRNA levels from previously published data (35). 
Since more than 65% of the mRNA levels included in this 
study were calculated to 10 copies/cell or less (40% were less 
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FIG. 6. Effect of highly abundant proteins on Pearson product moment correlation coefficient for mRNA and protein abundance in yeast. The set of 106 genes was 
ranked according to protein abundance, and the correlation value was calculated by including the 40 lowest-abundance genes and then progressively including the 
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than 4 copies/cell), the error associated with these values may 
be quite large. The mRNA levels were calculated from more 
than 20,000 transcripts. Assuming that the estimate of 15,000 
mRNA molecules per cell is correct (16), this would mean that 
mRNA transcripts present at only a single copy per cell would 
be detected 72% of the time (35). The mRNA levels for each 
gene were carefully scrutinized, and only mRNA levels for 
which a high degree of confidence existed were included in the 
correlation value. 

Protein abundance was determined by metabolic radiolabel- 
ing with [ 35 S]methionine. The calculation required knowledge 
of three variables: the number of methionines in the mature 
protein, the radioactivity contained in the protein, and the 
specific activity of the radiolabel normalized per methionine. 
The number of methionines per protein was determined from 
the amino acid sequence of the proteins identified by tandem 
mass spectrometry. For some proteins, it was not known 
whether the methionine of the nascent polypeptide was pro- 
cessed away. The N termini of those proteins were predicted 
based on the specificity of methionine aminopeptidase (31). If 
the N-terminal processing did not conform to the predicted 
specificity of processing enzymes, the calculation of the num- 
ber of methionines would be affected. This discrepancy would 
affect most the quantitation of a protein with a very low num- 
ber of methionines. The average number of calculated methi- 
onines per protein in this study was 7.2. We therefore expect 
the potential for erroneous protein quantitation due to un- 
usual N-terminal processing to be. small. 



The amount of radioactivity contained in a single spot might 
be the sum of the radioactivity of comigrating proteins. Be- 
cause protein identification was based on tandem mass spec- 
trometry techniques, comigrating proteins could be identified. 
However, comigrating proteins were rarely detected in this 
study, most likely because relatively small amounts of total 
protein (40 jxg) were initially loaded onto the gels, which re- 
sulted in highly focused spots containing generally 1 to 25 ng of 
protein. Because of the relatively small amount loaded, the 
concentrations of any potentially comigrating protein would 
likely be below the limit of detection of the mass spectrometry 
technique used in this study (1 to 5 ng) and below the limit of 
visualization by silver staining (1 to 5 ng). In the overwhelming 
majority of the samples analyzed, numerous peptides from a 
single protein were detected. It is assumed that any comigrat- 
ing proteins were at levels too low to be detected and that their 
influence in the calculation would be small. 

The specific activity of the radiolabel was determined by 
relating the precise amount of protein present in selected spots 
of a parallel gel, as determined by quantitative amino acid 
composition analysis, to the number of methionines present in 
the sequence of those proteins and the radioactivity deter- 
mined by liquid scintillation counting. It is possible that the 
resulting number might be influenced by unavoidable losses 
inherent in the amino acid analysis procedure applied. Because 
four different proteins were utilized in the calculation and the 
experiment was done in duplicate, the specific activity calcu- 
lated is thought to be highly accurate. Indeed, the specific 
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FIG. 7. Relationship between codon bias and protein and mRNA levels in this study. Yeast mRNA and protein expression levels were calculated as described in 
Materials and Methods. The data represent the same 106 genes as in Fig. 5. 



activities calculated for each of the four proteins varied by less 
than 10%. Any inconsistencies in the calculation of the specific 
activity would result in differences in the absolute levels calcu- 
lated but not in the relative numbers and would therefore not 
influence the correlation value determined. 

The protein quantitative method used eliminates a number 
of potential errors inherent in previous methods for the quan- 
titation of proteins separated by 2DE, such as preferential 
protein staining and bias caused by inequalities in the number 
of radiolabeled residues per protein. Any 2D gel-based method 
of quantitation is complicated by the fact that in some cases the 
translation products of the same mRNA migrated to different 
spots. One major reason is posttranslational modification or 
processing of the protein. Also, artifactual proteolysis during 
cell lysis and sample preparation can lead to multiple resolved 
forms of the protein. In such cases, the protein levels of spots 
coded for by the same mRNA were pooled. In addition, the 
existence of other spots coded for by the same mRNA that 
were not analyzed by mass spectrometry or that were below the 
limit of detection for silver staining cannot be ruled out. How- 
ever, since this study is based on a class of highly expressed 
proteins, the presence of undetected minor spots below silver 
staining sensitivity corresponding to a protein analyzed in the 
study would generally cause a relatively small error in protein 
quantitation. 

Codon bias is a measure of the propensity of an organism to 
selectively utilize certain codons which result in the incorpo- 
ration of the same amino acid residue in a growing polypeptide 
chain. There are 61 possible codons that code for 20 amino 
acids. The larger the codon bias value, the smaller the number 
of codons that are used to encode the protein (19). It is 



thought that codon bias is a measure of protein abundance 
because highly expressed proteins generally have large codon 
bias values (3, 13). 

Nearly all of the most highly expressed proteins had codon 
bias values of greater than 0.8. However, we detected a number 
of genes with high codon bias and relative low protein abun- 
dance (Fig. 7). For example, the expressed gene with both the 
second largest protein and mRNA levels in the study was 
EN02_ YEAST (775,000 and 289.1 copies/cell, respectively). 
EN01_YEAST was also present in the gel at much lower 
protein and mRNA levels (44,200 and 0.7 copies/cell, respec- 
tively). The codon bias values for EN02 and ENOl are similar 
(0.96 and 0.93, respectively), but the expression of the two 
genes is differentially regulated. Specifically, EN01_YEAST is 
glucose repressed (6) and was therefore present in low abun- 
dance under the conditions used. Other genes with large codon 
bias values that were not of high protein abundance in the gel 
include EFT1, TIF1, HXK2, GSP1, EGD2, SHM2, and TALI. 
We conclude that merely determining the codon bias of a gene 
is not sufficient to predict its protein expression level. 

Interestingly, codon bias appears to be an excellent indicator 
of the boundaries of current 2D gel proteome analysis tech- 
nology. There are thousands of genes with expressed mRNA 
and likely expressed protein with codon bias values less than 
0.1 (Fig. 4A). In this study, we detected none of them, and only 
a very small percentage of the genes detected in this study had 
codon bias values between 0.1 and 0.2 (Fig. 4B). Indeed, in 
every examined yeast proteome study (5, 7, 13, 28) where the 
combined total number of identified proteins is 300 to 400, this 
same observation is true. It is expected that for the more 
complex cells of higher eukaryotic organisms the detection of 
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low-abundance proteins would be even more challenging than 
for yeast. This indicates that highly abundant, long-lived pro- 
teins are overwhelmingly detected in proteome studies. If pro- 
teome analysis is to provide truly meaningful information 
about cellular processes, it must be able to penetrate to the 
level of regulatory proteins, including transcription factors and 
protein kinases. A promising approach is the use of narrow- 
range focusing gels with immobilized pH gradients (JPG) (23). 
This would allow for the loading of significantly more protein 
per pH unit covered and also provide increased resolution of 
proteins with similar electrophoretic mobilities. A standard pH 
gradient in an isoelectric focusing gel covers a 7-pH-unit range 
(pH 3 to 10) over 18 cm. A narrow-range focusing gel might 
expand the range to 0.5 pH units over 18 cm or more. This 
could potentially increase by more than 10-fold the number of 
proteins that can be detected. Clearly, current proteome tech- 
nology is incapable of analyzing low-abundance regulatory pro- 
teins without employing an enrichment method for relatively 
low-abundance proteins. In conclusion, this study examined 
the relationship between yeast protein and message levels and 
revealed that transcript levels provide little predictive value 
with respect to the extent of protein expression. 
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High-throughput technologies, such as proteomic screening and DNA micro-arrays, produce vast 
amounts of data requiring comprehensive analytical methods to decipher the biologically relevant 
results. One approach would be to manually search the biomedical literature; however, this would be 
an arduous task. We developed an automated literature-mining tool, termed MedGene, which 
comprehensively summaries and estimates the relative strengths of all human gene-disease 
relationships in Medline. Using MedGene, we analyzed a novel micro-array expression dataset 
comparing breast cancer and normal breast tissue in the context of existing knowledge. We found no 
correlation between the strength of the literature association and the magnitude of the difference in 
expression level when considering changes as high as 5*fold; however, a significant correlation was 
observed [r = 0,41; p « 0,05) among genes showing an expression difference of 10-fold or more. 
Interestingly, this only held true for estrogen receptor (ER) positive tumors, not ER negative. MedGene 
identified a set of relatively understudied, yet highly expressed genes in ER negative tumors worthy of 
further examination. • 

Keywords: bioinformatics . micro-array ♦ text- mining ♦ gene-disease association . breast cancer 



Introduction 

At its current pace, the accumulation or biomedical literature 
outpaces the ability of most researchers and clinicians to stay 
abreast of their own immediate Holds, let alone cover a broader 
range of topics. For example; to follow a single disease, e.g.. 
breast cancer, a researcher would have had to scan 130 different 
journals and read 27 papers per day in 1999. 1 This problem is 
accentuated with high-throughput technologies such as DNA 
micro-arrays and proteomics. which require the analysis or 
large'datascts involving thousands of genes, many of which are 
unfamiliar to a particular researcher, In any tnicroarray experi- 
ment, thousands of genes may demonstrate statistically sig- 
nificant expression changes, but only a fraction of these may 
be relevant to the study. The ability to interpret these datasets 
would be enhanced if they could be compared to a compre- 
hensive summary of what is known about all genes. Thus, there 
is a need to summarize existing knowledge in a format that 
allows for the rapid analysis of associations between genes and 
diseases or other specific biological concepts. 

One solution to this problem is to compile structured digital 
resources, such as the Breast Cancer Gene Database 1 and the 
Tumor Gene Database. 2 However, as these resources are hand- 
curated, the labor-intensive review process becomes a rate- 
limiting step in the growth or the database. As a result, these 
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databases have a limited scale and the genes are not selected 
in a systemadc fashion. 

An alternative approach is automated text mining: a method 
which involves automated information extraction by searching 
documents for text strings and anaryzing their frequency and 
context. This approach has been used successfully in several 
instances for biological applications. In mosi cases, it has been 
applied to extract information about the relationships or 
Interactions that proteins or genes have with one another, in 
the literature or by functional annotation. 3 " 7 Thus far, few 
publication have applied text-mining to examine the global 
-relationships between genes and diseases. Perez-lratxeta et a I, 
automatically examined the GO (Gene Ontology) annotation 
of genes and their predicted chromosomal locations in Order 
to identify genes linked to inherited disorders. 8 

To obtain a more global understanding of disease develop- 
ment, it would be valuable to Incorporate Information regarding 
all possible gene-disease relationships, including biochemical, 
physiological, pharmacological, epidemiological, as well as 
genetic. This information would enable comprehensive com- 
parisons between large experimental datasets and existing 
knowledge in the literature. This would accomplish two things. 
First, it would serve to validate experiments by demonstrating 
that known responses occur as predicted, Second, It would 
rapidly highlight which genes are corroborated by the literature 
and which genes are novel in a given context. We have utilized 
a computational .approach to literature mining to produce a 
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comprehensive set of gene-disease relationships. In addition, 
we have developed a novel approach to zssess the strength of 
each association based on the frequency of citation and co- 
citation. We applied this tool to help interpret the data from a 
large micro-array gene expression experiment comparing 
normal and cancerous breast tissue- 



Methods 

McdGene Database. MedCene is a relational database, stor- 
ing disease and gene information from NCBI. text mining re- 
sults, statistical scores, and hyperlinks to the primary lit- 
erature. MedGene has a web-based user Interface for users to 
query the database (http^/Wr>seq.mediian/ard.edu/MedGeriey). 

Text Mining Algorithms. MeSH files were downloaded from 
the MeSH web siteatNLM (Nation Library of Medicine) (http:// 
www.nlm.nirtgov/mesh/meshhome.htmI) and human disease 
categories were selected. LocusUnk files were downloaded from 
the LocusLink web site at NCBI (http://ww.ncbl.rUh.gov/ 
LocusUnk/). Official/preferred gene symbol, official/preferred 
gene name, and gene alternative symbols and names, all 
relevant annotations and URLs for each -LocusLink record, were 
collected. Gene search terms were used for literature searching 
and included all qualified gene names, gene symbols, and gene 
family terms. Primary gene keys, predominantly qualified gene 
family terms and gene official/ preferred symbols, were used 
to index Medline records. If the offidal/preferred gene symbols 
did not meet the standards to be an index, then qualified gene 
official/preferred names were used. A local copy of Medline 
records (up to July, 2002) was pre-selected, 

A JAVA module examined the MeSH terms and then indexed 
each Medline record with the appropriate disease terms. A 
separate JAVA module was used to examine the titles and 
abstracts for gene search terms and then to index the gene* 
related Medline records with the relevant primary gene key(s). 

Statistical Methods. For every gene and disease pair, we 
counted records that were indexed for both gene and disease 
(douole positive hits), for disease only (disease single hits), for 
gene only (gene single hits), and for neither gene nor disease 
(double negative hits) to generate a 2 x 2 contingency table. 
On the basis of the contingency table-framework, we applied 
different statistical methods to estimate the strength of gene- 
disease relationships and evaluated the results. These methods 
included chi-square analysis. Fisher's exact probabilities, rela- 
tive risk of gene, and relative risk of disease' 6 (http:// 
hipseq.med.harvard.edu/MedCene/). In addition, we computed 
the "product of frequency", which is the product of the 
proportion of disease/gene double hits to disease single hits 
and the proportion of disease/gene double hits to gene single 
hits. To obtain a normal distribution, we transformed all the 
statistical scores using the natural logarithm. We selected the 
log of the product of frequency (LPF) to validate MedGene and 
to use for the analysis with the micro-array data. Spearman 
rank-correlation coefficients were used to assess the linear 
relationship between LPF and micro-array fold change in 
expression level. 

Global Analysis. Diseases with at least 50 related genes were 
selected for clustering analysis, and the LPF scores were 
normalized with total score for each disease. Hierarchical 
clustering was done with the "Cluster" software and the 
clustering result was visualized using "TreeViewer" (http:// 
ranaJbl.gov/EisenSofrware.htm). 
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Breast Tissue Micro-Arrays. Eighty-nine breast cancer 
samples (?9% ER-posirJve) and 7 normal breast tissue samples 
were selected from the Harvard Breast SPORE frozen tissue 
repository and were representative of the spectrum of histo- 
logical types, grades, and hormone receptor immunc-pheno- 
types of breast cancer. Biotinyiated cRNA. generated from the 
total RNA extracted from the bulk tumor, was hybridized to 
Affymetrlx U95A oligonucleotide micro-arrays. These micro- 
arrays consist of 12 400 probes, which represent approximately 
9000 genes. Raw expression values were obtained using CENE- 
CHIP software from Aflymetrix. and then further analyzed using 
the DNA-Chip Analyzer (dChip) custom software. 

Results 

Automated Indexing of Medline Records by Disease and 
Gene. To study the gene-disease associations in the literature, 
we first compiled complete lists for human diseases and human 
genes. To Index all Medline records that were relevant to 
human diseases, the Medical Subject Heading (MeSH) index 
of Medline records was utilized. MeSH is a controlled medical 
vocabulary from the National Library of Medicine and consists 
of a set of terms or subject headings that are arranged in both 
an alphabetic and an hierarchical structure. Medline records 
are reviewed manually and MeSH terms are added to each with 
software assistance. 9 * 10 Twenty-three human disease category 
headings along with all of their child terms (see the Supporting 
Information, Supplemental Table 1. or visit http://hipseq. 
med.narvarf.edu/MedGene/publicatlon/s_Table i.html) were 
selected from the 2002 MeSH index creating a list of 4033 
human diseases, 

No index comparable to the MeSH index exists for genes, 
and thus, it was necessary to apply a string search algorithm 
for gene names or symbols found in Medline text. A complete 
list of genes, gene names, gene symbols, and frequently used 
synonyms were collected from the LocusUnk database at 
NCBI, 11 11 which contains 53 259 independent records keyed 
by an official gene symbol or name (June l8 m , 2002). For the 
purposes of this study, no distinction was made between genes 
and their gene products. Authors often use the same name for 
both, differentiating the two only by the use of italics, If at all. 
For the intended use of this study, this lack of distinction is 
unlikely to have a large effect and may in Tact be beneficial. 

Initial attempts to search the literature using these lists 
revealed several sources of false positives and false negatives 
(Table 1). False positives primarily arose when the searched 
term had other meanings, whereas false negatives arose from 
syntax discrepancies necessitating the development of fitters 
to reduce these errors. The syntax issues were readily handled 
by including alternate syntax forms in the search terms. The 
false positive cases, caused by duplicative and unrelated 
meanings for the terms, were more difficult to manage. Where 
possible, case sensitive string mapping reduced inappropriate 
citations. In many cases, however, this was not sufficient and 
the terms had to be eliminated entirely, thereby reducing the 
false positive rate but unavoidably under-representing some 
genes. 

For the purposes of data tracking, a primary gene key was 
selected to represent all synonyms that correspond to each 
gene. Medline records were indexed with a primary gene key 
when any synonym for that key was found in the title or 
abstract. Case-Insensitive string mapping was used for all 
searches except as noted above. No additional weight was 
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Table 1, Systematic Sources of False Positives and False Negatives in Unfiltered Data' 
source of e*ror error type sample 
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fitter solution 



gene symbol/name 
is not unique 



gene symbol is 

unrelated abbreviation 
gene symbol/name 

has language meaning 
nonstandard syntax 
unofficial gene name/symbol 
nonspecifled gene name 



false positive 

false positive 

false positive 

false negative 
false negative 
false negative 



MAO-royeiin 

associated glycoprotein 
, MAC-malignancy-assoclated 
protein 

PA— pallid homologue (mouse), 
pallidin (also abbrev. far Pennsylvania) 

VWS-Wiskott-Aldrich Syndrome 
(also the word "was'*) 

BAG- 1 Instead of BAG! 

P53 instead or TP53 

estrogen receptor Instead of 
Estrogen receptor 1 



eliminate this term 

eliminate this term 

case -sensitive string search 

add dash term 

add all gene nicknames 

add family stem term 



* In oreliminarv studies Medline was searched for co-occurrence of genes and diseases and the resulting output ww evaluated to Identify enor swau that 
In l"* 1 ™^™™* i«T SStth error source is catceor feed by the typt of error it causes: false positives arc suggested relationships that are not real and 

error. Ingenerai. error rotes maximized sensitivity. «ven ai the expense of specificity If needed. 

added for multiple occurrences of a term or the co-occurrence 
of multiple synonyms for the same gene Key. 

Medline records were searched with at! qualified gene 
identifiers, such as the official/preferred gene symbol the 
official/preferred gene name, all gene nicknames and all syntax 
variants. In situations where there are several members of a 
gene family or splice variants, some authors prefer to use a 
shortened gene family name, e.g., estrogen receptor instead of 
estrogen receptor 1 {ESRlh creating a source of false negatives. 
For this reason, gene family stem terms were created for ail 
genes that have an alpha or numerical suffix (e.g., IL2RA. TGFp, 
ESRl etc.) and then used to search the literature. The family 
stem terms were handled separately from the specific gene 
names so that it would be clear when linkages were made to 
the gene family versus a specific member in that family. 

To improve performance and accuracy, some pre-selection 
was applied to the records that were scanned. First, review 
articles were eliminated to avoid redundant treatment of 
citations. Second. non-English Journals were removed because 
the natural language filters were only relevant to English 
publications. Finally, journals unlikely to contain primary data 
about gene-disease relationships were also removed (e.g„ Int. 
J, Health Educ. f Bedside Nurse, and / Health Econ.)> Together, 
these filters reduced the 12 198 ZZ\ Medline publications (July 
2002) by 37%. 

Ranking the Relative Strengths or Gene-Disease Associa- 
tions. In total, there were 618 708 gene-disease co-citations, 
In which 16% (8297) of all studied genes had been associated 
to a disease and 96% (3875) of all diseases had been associated 
to at least one gene. To rank the relative strengths of gene 
disease relationships, we tested several different statistical 
methods and examined the results. With the exception of the 
relative risk estimates, the methods provided similar "results 
with respect to the rank order of the gene-disease association 
strengths. However, after comparing the results to other 
databases and after consulting disease experts, the log of the 
product of frequency (LPF) was selected for further analysis 
because it gave the best results overall. 

Validation of MedGene* In developing this tool, it was 
important to minimize the number of missed genes (false 
negatives) and miscalled genes (false positives). However, in 
situations when these goals were in conflict, inclusiveness was 
prioritized. To determine the false negative rate In MedGene. 
' breast cancer was used as a test case because it was associated 
with more genes than any other human disease and because 




Figure 1. Estimation of the false negative rate by comparison 
with hand-curated databases. The breast cancer-related genes 
identified by MedGene were compared with those listed in 
several other databases including the Tumor Gene Database 
(TGD),* the- Breast Cancer Gene Oatabase(BCG), 1 GeneCards 
(GC)" and Swissprot. 1 * Genes were considered false negatives 
if they were represented in at least one of these other databases 
and not in MedGene and their link to breast cancer was sup- 
ported by at least one literature reference. All literature references 
were verified by manual review to confirm their validity. The 
number of genes in each database or shared by more than one 
database is indicated. The false negative rate was calculated by 
genes missed at MedGene (26)/total number of nonover lapping 
genes In other databases (285). 

there were several public databases that link genes to breast 
cancer. We compared the list of breast cancer-related genes 
from MedGene to these databases, illustrated In Figure 1, 
Among the 285 distinct breast cancer-related genes that were 
supported by at least one literature citation in these hand- 
curated databases, 26 were absent from MedGene. suggesting 
a false negative rate of approximately 9%. To determine why 
these were missed, all literature references for these genes (80 
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papers) were reviewed manually (see the Supporting Informa- 
tion Supplemental Table 2. or visit http://hlpseq.med. 
harvard.edu/MedCene/pubHcation/s_Table 2,hirol). Among 
these papers, most false negatives were caused by nonstandard 
gene terms or gene terms eliminated by our specificity niters. 
Few genes were missed because they were only mentioned in 
review papers (0.4%) or they appeared only in the bod) p of the 
manuscript but not die abstract or tide (1.1%). Of note. 
MedGene Identified approximately 2000 additional breast 
cancer-related genes not listed in any other database. 

To assess the false positive error rate, two complementary 
approaches were used: a detailed analysis of one disease and 
a global examination of 1000 diseases. The detailed approach 
examined the false positive error rate and Its sources, whereas 
the global approach tested whether the overall results made 
biomedical sense. 

Using the LPF. 1467 genes related to prostate cancer were 
assembled in rank order. We then retrieved approximately 300 
Medline records each for the highest ranked 100 and the lowest 
ranked 200 genes and manually reviewed the titles and 
abstracts to determine the verity of the association. Nearly 80% 
of the highest ranked 100 genes fell into one or the Bve 
categories that reflect meaningful gene-disease relationships 
(see the Supporting Information, Supplemental Table 3, or visit 
httpV/hlpseq.med.hafvard.edu/MedGene/publlcation/ 
s Table 3.html), Among the lowest ranked, 200 genes, ap- 
proximately 70% reflected true relationships. Of the 600 records 
reviewed, there were only two in which the association between 
the gene and the disease was described as negative. Both were 
genes with very low scores, in both cases, the authors did not 
argue the absence of any relationship, but rather that a 
particular feature of the gene or protein was not shown to be 
related to human prostate cancer, 11 " 

The coincidence of some gene symbols with medical ab- 
breviations, chemical abbreviations and biological abbrevia- 
tions resulted in most of the false positives (see the Supporting 
Information, Supplemental Table 4. or visit http://htpse- 
q med.r^ard.edu/MedCene/publication/s„Table 4.html). em- 
* phashdng the importance of the niters that were added in the 
search algorithm (Table t) . Without the filters, the false positive 
rate more than doubled, and the faJse negative rate rose 
dramatically (data not shown). For example, among the papers 
about breast cancer, there were only 12 Medline records that 
referred to ESRt and 10 to ESRZ, whereas almost 2000 papers 
mentioned estrogen receptor without specifying ESRl or SSRZ 
this latter group was detected by the family stem term filter. 

To further validate these results, a global analysis of the gene- 
disease relationships described by MedGene was performed. 
For this experiment, it was reasoned that the more closely 
related the diseases are to one another, the more they will be 
related to the same gene sets. Thus, if the relationships defined 
by MedGene accurately reflected the literature, then an unsu- 
pervised hierarchical clustering of the gene data should group 
diseases in a manner consistent with common medical think- 
ing Conversely, if the clustered diseases do not make sense 
biologically or medically, it may reflect excessive false positives, 
false negatives, or inappropriate scoring of the data. 

To execute tills experiment, the gene sets and the corre- 
sponding LPF values for 1000 randomly selected diseases (each 
with at least 50 gene relationships) were used as a dataset for 
clustering the diseases. A review of the results showed that the 
resulting disease clusters were indeed logical based upon 
common medical knowledge (see the Supporting Information. 
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Supplemental Figure 1. or visit htq)://lii P «q.med.harvard.edu/ 
MedGene/pubhcation/s Jigure i.html). For example. In one 
such cluster shown In Figure 2, diabetes and its complications 
grouped togetfier and were also closely linked to diseases 
associated with starvation states. 

The number of genes associated with a given disease can 
be estimated by adjusting the MedGene number up by the false 
negative rate (-9%) and down by the false positive rate (-26% 
on average). Using this, the average disease has 1037 ± 45.3 
(mean ± s.d,) genes associated with it, although the range is 
quite broad with 2359 genes related to breast cancer, 2122 
genes related to lung cancer and no genes related to a number 
of diseases. 

Applying MedGene to the Analysis of Large Datasets. Access 
to a comprehensive summary of the genes linked to human 
diseases provided an opportunity to analyze data obtained from 
a high-throughput experiment. We compared the MedGene 
breast cancer gene list taa gene expression data set generated 
from a micro-array analysis comparing breast cancer and 
normal breast tissue samples. Micro-array analysis identified 
2286 genes that had greater than a 1-fold difference In mean 
expression level between breast cancer samples and normal 
breast samples. Using MedGene, we sorted the 2286 genes into 
four classes: 555 genes direcdy linked to breast cancer in the 
literature by gene term search (first-degree association by gene 
name); 328 genes directly linked by family term searcfc (first- 
degree association by family term); 1021 genes linked to breast 
cancer only through other breast cancer genes (second-degree 
association); and 505 genes not previously associated with 
breast cancer. (See the Supporting Information, Supplemental 
Figure 2. or visit http://hlpseq.med.harvard.edu/MedGene/ 
publlcation/s_Ftgure 2.htmL) Among the 505 previously un- 
related genes. 467 were either newly identified genes or genes 
chat had not previously been associated with any disease. 
Among the remaining 38 genes, 9 had been related to other 
cancers, specifically esophageal, colon, uterine, skin, and cervix. 

To determine whether the genes highlighted by the micro- 
array analysis were more likely to have been previously linked 
to breast cancer In die literature, we created a two-dimensional 
plot of the fold change of expression level between breast 
cancer and normal tissue versus (he literature score (LPF) 
(Figure 3A). There was a broad spread of expression changes 
among the genes directly linked to breast cancer ranging from 
less than 1-fold change (68%) to over 40-fold (0.3%). Notably, 
the majority of genes with greater than 10-fold expression 
changes were linked to breast cancer by first-degree associa- 
tion. 

Among all 754 genes directly linked to breast cancer in the 
literature, there was no correlation between LPF and micro- 
array fold change (r - 0.018. p-valu* - 0.62). However, when 
we stratified the analysis based on the magnitude of the Told 
change, we observed an increasing trend in correlation (Figure 
3B) suggesting that genes with a more substantial change in 
expression level were more likely to have a stronger association 
in the literature. For genes that had 10-fold change or more In 
expression level, the correlation increased to 0.41 (p-value - 
0.05). 

When we evaluated il^e micro-array data separately Tor ER 
positive and ER negative tumors, the trend in correlation 
between fold change and literature score was highly dependent 
on estrogen receptor status. Interestingly, there was a similar 
trend in correlation for ER positive tumors, but no trend In 
correlation for ER negative tumors. 
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Figyte 2. Global validation by clustering analysis. 2(A). The gene sets and the corresponding LPF values for 1000 diseases, each with 
at least 50 gene relationships, were used in an unsupervised clustering of the diseases based on the gene patterns associated with 
them. A sample of the data is shown here. 2(B). One of the resulting clusters is shown that corresponds to blood sugar states. Diabetes 
terms (above the line) and starvation states terms (under the line) clustered together. Within these groups, there is also clustering of 
diabetic small vessel complications, altered serum chemistries, nutritional disorders. etc.(Supplemental Figure 1: http:V/hipseq.med. 
harvard .edu/MedGene/puDlication/s, Figure 1.html). 




Finally . to validate our findings, we computed simitar cor- 
relations between the breast cancer expression data and 
LPF scores generated by MedGenc for hypertension, a 



disease unrelated to breast cancer. As expected, we did not 
observe an Increasing trend in correlation for hyperten- 
sion- 
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Figure X Relationship between literature spore and functional data for breast cancer. 3 A. The data from an expression analysis of 
samples for breast tumors and normal breast tissue were analyzed to indicate the fold difference of expression level between breast 
tumor and normal sample (cutoff > 3-fold change). The fold changes were plotted against the literature score for the same gene set. 
Green dots represent first-degree association by gene search, blue dots represent first-degree association by family search and red 
dots represent no-associatibn. Some well-studied genes, such as BRCA2 (pink circle), are not reflected by a substantial difference in 
expression level. Furthermore, the majority of genes that have no association with breast cancer in the literature had less than 10-fold 
expression changes {shaded area). 3B. The Spearman rank-correlation coefficients between literature score <LPF) and the fold change 
of expression level between tumor and normal breast samples (v-ows) in relation to the amount of fold change of expression level 
(x-axis). Gene rank lists were generated for breast cancer (blue) and hypertension (pink). Correlations were also computed between 
the breast cancer gene LPF scores and fold change expression data among estrogen receptor positive tumors only (light blue) and 
estrogen receptor negative tumors only (purple). 
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breasi neoplasms 

estrogen receptor' 

PGR 

EHBB2 

BRCAi 

BRCA2 

ECFR 

CYPW 

TFFt 

PSEN2 

TP53 

CES3 

CEACAM5 

EKBB3 

cyclin 

C0X5A 

cathepsin 

ERBB4 

TRAM 

CCNDl 

EGF 

MVCt 

insulin-like 
BCL2 

mucin 
FCF3 



hy pcrtc ns ion 


rheumatoid arthritis 


REN 


RA 


DBF 


TNFRSF10A 


IMP 


CRP 


ACT 


AS 


INS 


ESRI 


kallikreLn 


HlA-DRBl 


ACE 


DRl 


endothetln 


inter teukin 


SIQOAS 


TNF 


BDK 


US 


DIANPH 


collagen 


SARI 


ILIA 


PIH 


ACR . 


CD59 


TNFRSFI2 


ALB 


112 


CYPUB2 


CHJ3LI 


MAT2B 


IL8 


angiotensin ' 


Interteukln I 


receptor 


matrix 


ACTR2 


metalloprotelnase 


NPPA 


interferon 


IVM 


CD68 


O&H 


IL4 


NPY, 


IL17 


POMC 


MMP3 


neuropeptide 


SIL 



bipolar disorder 


atherosclerosis 


ERDA1 


apolipoprotein 


SNAP20 


APOE 


PPKL 


LDLR 




ELN 


TRH 


ARCl 


IMPA2 


APOB 


HTR3A 


APOAl 


DRD3 


MSRS 


REM 


LPl 


KCNN3 


pom 




plasminogen 


DRD4 


activator inhibitor 


HTR2C 


PLC 




vascular cell 


REIN ' 


adhesion molecule 


DBH 


ATOH1 


MAOA 


VWF 


COMT 


INS 


HTRZA 


ARC2 


SYNJI 


ABC A! 


JNPP1 


OLRI 


NEDD4L 


collagen 


FRA13C 


MCP 


transducer or 




ERBB2 


lipoprotein 


BAIAP3 


APOA2 




intercellular 


ATP1B3 


adhesion molecule 


DRD5 


RAB27A 



MedGene/). 



Discussion 

The Human Genome Project heralded a new era in biological 
research where the emphasis on understanding specific path- 
ways has expanded to global studies of genomic organization 
and biological systems. High-throughput technologies can 
provide novel insight into comprehensive biological function 
but also introduces new challenges. The utility of these 
technologies is limited to the ability to generate, analyze, and 
interpret large gene lists. MedGene. a relational database 
derived by mining the Information in Medline, was created to 
address this need. MedCene users can query for a rank-ordered 
list of human gene-disease relationships CTable 2) for one or 
more diseases. Each entry is hyperlinked to the original papers 
supporting each association and to other relevant databases. 

MedCene is an Innovative extension of previous text mining 
approaches. Perez-Iratxeta et al used the CO annotation and 
their chromosomal locations to predict genes that may con 
tribute to inherited disorders. 8 MedGene takes a broader view 
and Includes all diseases and all possible gene-disease relation- 
ships. Furthermore, MedGene utilizes co-citation to indicate a 
relationship rather than GO annotation, which is limited to the 
subset of genes that have CO annotation. Our approach is 
complementary to that taken by Chaussabel and Sher, who 
used the frequency of co-cited terms to cluster genes into a 
hierarchy of gene-gene relationships. 6 

A unique aspect of this tool is the ability to assess the relative 
strengths of gene-disease relationships based on the frequency 
of both co-citation and single citation; This presupposes that 
most co-cltations describe a positive association, often referred 
to as publication bias ,s and is supported by our observations 



that negative associations are rare (Supplemental Table 3: 
http;//hipseq.med.harvard t edu/MedGene/publication/5.Ta- 
ble 3.html). Of course, relationships established by frequency 
of co-cttatlon do not necessarily represent a true biological link; 
however, It Is strong evidence to support a true relationship. 
Another important feature of MedGene is the implementa- 
tion of software filters that substantially reduced the error rate. 
We estimate that less than 10% of all associations were missed 
and at least 70% of even the weakest associations were real. 
For this study, all of the filters that we applied were general 
ones. e.g., expanding the list of all gene names to address the 
different syntax forms used by different journals, eliminating 
gene names that correspond to common English words, etc. 
The majority of the remaining search term ambiguities were 
idiosyncratic and difficult to identify systematically without 
causing a significant rise in false negatives, Alternative ap- 
proaches, such as the examination of the nearest neighbor 
terms, need to be considered to further reduce the false positive 
rate. 

It is not uncommon to see expression changes in micro* 
array experiments as small as 2-fold reported In the literature. 
Even when these expression changes are statistically significant, 
it is not always clear If they are biologically meaningful, When 
comparing expression levels of disease to normal tissue, one 
expects an enrichment of known disease-related genes to 
appear in the altered expression group, MedGene provided a 
unique opportunity to test this notion in the context of existing 
knowledge on a novel breast cancer micro-array dataset. For 
genes displaying a 5-fold change or less in tumors compared 
to normal, there was no evidence of a correlation between 
altered gene expression and a known role In the disease. This 
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Table 3. Genes with Large Expression Changes in ER- but 
Not in ER+ Breast Tumors 

gene symbol 

~~ KRTHB1 
BRS3 
DKKl 
ZICi 
TLR1 
KIAA0680 
CDKN3 
EBIZ 
GZMB 
STK18 
GPR49 
MYOW 
LAD I 
POLE2 
HMC4 
BCL2L1I 
LRP8 

cemz 

CCNE2 
FOB 
KNSlM 
HIF5 

SERPINH2 
YAP1 
LPtlB 
TCEA2 
TFF1 
COU7A1 
POP5 
BPAGl 
PDZKl 
VECFC 
MUC6 
SERPINA5 
M&Sl 
CA12 

Table 3. MedGene identified a «et of relatively understudied, yet highly 
expressed genes In £R negative, but not ER positive breast tumors AU of 
these genes have either never b«n co-cited with breast cancer or have a 
weak as&oclaOon except those marked with an *, 

reflects the many genes whose role In breast cancer may not 
involve large changes In expression in sporadic tumors (e.g., 
BRCAl and BRCA2) and genes whose modest changes In 
expression may be unrelated to the disease. Strikingly, among 
genes with a 10-fold change or more in expression level, there 
was a strong and significant correlation between expression 
level and a published role in the disease, providing the first 
global validation of the micro-array approach to identifying 
disease-specific genes. 

The results derived from MedGene have two implications. 
First, a careful hunt for corroborating evidence of a role in 
breast cancer should precede any further study of genes with 
less than 5-fold expression level changes. Second, any genes 
with 10-fold changes or more are likely to be related to breast 
cancer and warrant attention. It is likely that this threshold will 
change depending on the disease as well as the experiment. 

Interestingly, the observed correlation was only found among 
ER-posluve tumors, not ER-negative. This may reflect a bias 
in the literature to study the more prevalent type of tumor in 
the population. Furthermore, this emphasizes that caution 
must be taken when Interpreting experiments that may contain 
subpopulations that behave very differently. The MedCene 
approach identified a set of relatively understudied, yet highly 
. expressed genes in ER-negative tumors that are worthy of 
further examination (Table 3). 
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fold change (ER-) 


1.0 


610.8 


1.2 


89.4 


i.2 


69.8 


1.9 


59.6 


1.0 


38.5 


2.6 


33.2 


1.0 


30.6 


4.0 


27.9 


3.8 


21.9 


4.7 . 


18.6 


t.o - 


14.6 


1.6 


14.4 


-1.0 


13,5 


4.2 


13.0 


4.4 


12.9 


-1.2 


12.3 


2.9 


12.2 


1.0 


11.8 


4.0 


11.6 


-4.3 


U.l 


* 2.9 


10.9 


3.0 


10.2 


4.6 


102 


i.O 


10.0 


-1.3 


-10.4 


-1.1 


-riO.8 


1,3 


-U.4 


-4.1 


-15.7 


1.1 


-16.2 


-4.6 


-22.3 


-1.1 


-36.8 


-2-8 


-51.5 


-1.4 


-64.9- 


-1.0 


-83,1 


-1.6 


-85.9 


2.4 


-150.3 
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In conclusion, we have developed an automated method of 
summarizing and organizing the vast biomedical literature. To 
our knowledge, the resulting database is the most comprehen- 
sive and accurate of its kind. By generating a score that reflects 
the strength of the association, it provides an Important tool 
for the rapid and flexible analysis of large datasets from various 
high'throughput screening experiments. Furthermore, It can 
be used for selecting subsets of genes for functional studies, 
for building disease-specific arrays, for looking at genes com- 
mon to multiple diseases and various other high-throughput 
applications. In the future, it will be possible to enhance the 
utility of the MedCene database by building links between 
genes and other MeSH terms as well as other biological 
processes and concepts, such as ceil division and responses to 
small molecules. 
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human disease category headings along with all of their child 
terms selected from the 2002 MeSH index (Supplemental Table 
I); analysis of the causes of false negatives in MedGene 
(Supplemental Table); meaningful gene-disease relationships 
found in MedGene (Supplemental Table 3): causes for incorrect 
assignment of gene indexes (Supplemental Table 4); a review 
of the results, showing that the resulting disease clusters were 
indeed logical (Supplemental Figure 1); and a review of the 
results showing that among the 505 previously unrelated genes, 
467 were either newly identified genes or genes that had not 
previously been associated with any disease (Supplemental 
Figure 2). This material is available free of charge via the 
Internet at http://pubs.acs.org and at the web sites mentioned 
in the text. 
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